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of research studies and tabular material are presented 
chtonologically (1900-1952) under each topic heading. Topics under 
the major heading of "Criteria for Instructor Effectiveness" are 
rating the effectiveness of instructors, administrator rating, peer 
rating, student rating, self-rating, objective observation of 
performance, and student change as a measure. Topics under "The 
Predictors--Traits and Qualities Assumed to be Related to Instructor 
Effectiveness" ate intelligence, education, scholarship, age and 
experience, knowledge of subject matter and present professional 
information and teacher examination scores, extracurricular 
activities and genera.' culture test scores, socioeconomic status and 
sox and marital status, teaching aptitude and attitude toward 
teaching and interest, voice and speech characteristics, photograph, 
statistical analyses of abilities, personality studies and tests. (JS) 




* J . 




r*- 

■4- 
. -4 




O 

UJ 



/ 



^ RESEARCH 

^Bulletin 

AFPTRC-TR-54-44 



Identifying the Effective Instructor! A Review 
Of the Quantitative Studies, 1900-1952 



By Joseph E. Morsh 
And Eleanor W. Wilder 



iBPI 




1 ** 1 # 



£ AIR FORCE PERSONNEL & TRAINING RESEARCH CENTER 

l A t It l A N 0 All mtl lull • » A M ANtONIO • IIHI 

O 







MfADOtMITIlS 

AIR FORCE PERSONNEL AND TRAINING RESEARCH CENTER 

Ate t#uor<K nnd 0»v«Uptvi«n» Commend 
Lo<Vlond Air Fort# lot*, ?■««» 



at. Herbert H. Cewtu 



Or. Arthur W. Mitre* 

T»tkn?c«1 Dir ft tor 



Dr. Cberlei W. Kr ey 
0*pv<r 

fur Irtrunh u*4 



Dr. Henry $. Otffctrl 
Vrtife# 

fur ?•<» tt> fAfRtrmel’eA 



/ ■ REStARCH OCIOItt 1 t I 4 

zDulletin *■ afptrc-tr-54.44 



Tit* ormtd itrv?ct» now r*qvlr«th«t |«rg* numbm ofpmonntl betopidly 
trained to * high degree of technical competence In the operation and 
maintenance of a variety of complex electronic and mechanical equipment. 
The rapidity and effectiveness of training test In large measure upon the 
quality of instruction. Hence considerable importance is assumed hy the 
problems of locating individuals among military personnel who may be 
trained to become effective instructors and of evaluating instructors on 
the job. As a first step, the criteria suitable for determining Instructor 
proficiency must be ascertained, and the factors employed to predict in* 
structor effectiveness evaluated. In this Research Bulletin over three 
hundred sixty civilian studies which relate to the evaluation and predic- 
tion of instructor proficiency are reviewed and interpreted. 

Dr. Jettph C, Itetth It a mtmhet tf lha Pationntl Rataorth Laboratory 
of this Center. Mrs. Eleanor W. Wilder was formerly s member of the 
Training Mds Research Laboratory. The survey of studies was made as 
part of the tegilat research of the Training Aids Research Laboratory. 



0Q3 



IDENTIFYING THE EFFECTIVE INSTRUCTOR; 
A REVIEW OF THE QUANTITATIVE STUDIES 
1900-195? 



By Joseph f. , Morsh 
And Eleanor W. Wilder 



Ti'aii«in« A ids Keaeorch laboratory 
AIM FORCE PERSONNEL AND TRAINING RESEARCH CENTER 
Air Research and Development Cctfn?nd 
Chanute Air Force Ease, Illinois 



Approved by: 

Richard Faubion, Col, U3AF, Director 
Project No, 771 1 * Arthur A, Lvoaiaine, Technical Director 

Task No, 772^3 Training Aids Research Laboratory 




4 



ACKNCWIZ DOME NTS 



This review 19 the outgrowth of o working hlhllogr.phy which weo Moem- 
hled In Connection with H«m keao^ceeJle.eerehCen^r 507-010-0005, 

-The Identification of the Characteristics end v Aeachine i 1 a ^®„'r;® L r, OTRS 
ce 93 ful Technical School Instructor." The authors ere graveful 0 * . 

ceasiui lecnmoux * Swenscn who read the manuscript end offered 

manuscript Mrs. Katherine M. Zawadke supervised the setting up of the to- 
hies, 







TABLE OF CONTENTS 



Page 



List of Tables iv 

Introduction 1 

Principal Findings of Cited Research Studies , , , 2 

Criteria 2 

Fredictors * 5 

Criteria of Instructor Effectiveness 7 

Rating the Effectiveness of Instructors 10 

/ dmlnlstraMve Rr-ting of Instructor Effectiveness . • 15 

Peer Rating of Instructor Effectiveness 2) 

Student Rating of Instructor Effectiveness , 27 

Self-Rpting of Instructor Effectiveness 40 

Objective Observation of Instructor Performance 4 2 

* f \>'* '•'V. "hange -js a Measure of Instructor Effectiveness 50 

The Predictor 8 --Treits end Qualities Assumed to be Related 

to Instructor Effectiveness ...... 59 

Intelligence as Related to Instructor Effectiveness . . 60 

Education ea Related to Instructor Effectiveness , . 66 

Scholarship as Related to Instructor Effectiveness ...... , 70 

i^ge and Experience as Related to Instructor Effectiveness ..... 79 

Enov ledge of Subject Matter, Present Professional Information, 
end Teacher Examination Scores as Related to Instructor 

Effectiveness . ..... 34 

Extracurricular Activities and General Culture Teat Scores 

versua Instructor Effectiveness , 87 

Socioeconomic Status, Sex, and Marital Status versus Instructor 

Effectiveness ....... ..... 91 

The Relation of Teaching Aptitude, Attitude Tcvard Teaching,* and 

Interest to Instructor Effectiveness 96 

The Relation of Voice and Speech Characteristics to Instructor 

Effectiveness 101 

The Photograph as a Predictor of Instructor Effectiveness » * » « • 104 

Statistical Analyses of Instructor Abilities • • 105 

Opinion Studiee of the Personality Characteristics of Effective 

and Ineffective Instructors ... 108 

Personality Tests of Teachers 114 

Implications for Further Research 118 




Hi 



Table of Contents (Cont, ) 



Page 



Criterion Research ...... 119 

Predictor Reseafcph 122 

Bibliography 125 

¥ 

Revifevs end Bibliographies .......... 149 



LIST W TABIES 



Table Page 

1 Reliability of Administrative Ratihg of Instructors ]6 

2 Correlation of Administrative Rating vith Other Measures 

of Instruotor Effectiveness 19 

5 Correlations Betveen Ratings of Teacher Characteristics - • • • 22 

4 Reliability of Peer Rating of Instructors 25 

5 Correlation of Peer Rating vith Other Measures of 

Instructor Effeetlvenesc ....... . 26 

6 Reliability of Student Rating of Instructors ..... 29 

7 Correlation of Student Rating vith Other Measures of 

Instructor Effectiveness ........... 32 

8 tntercorre 1st ions by Trait of Student Rating of 

Instructors ....... ..... 33 

9 Correlation of Grades Received by Student* vith Their 

Rating of Their Instructors 33 

10 Relationship df Teacher Factors to Student Rating ... .36 

11 Relationship of Student Factors to Student Rating ....... 38 

12 Relationship of Self*Retinc to Othe'r Measures of Instructor 

Effectiveness . t*l 



lv 



List of Tables (Cont, ) 



Table Page 

13 Reliability of Various Methods of Observing Teaching 

Effectiveness 44 

14 Reliability of Measures of Student Gain 56 

15 Correlation of Measures of Student Gain vith Other 

Measures of Teacher Effectiveness . . 58 

16 Correlations Between A, C,E , Psychological Examination 

and Various Measures of Teacher Effectiveness ........ 62 

17 Correlations Between Various Psychological Examinations 

and Measures cf Teacher Effectiveness for Groups of 

Teachers of %' or More ........ 64 

l£ Relation of Education to Instructor Effectiveness 67 

19 r-Umtional Qualifications of "Best," White, High 

School Teachers .............. « . 69 

2C Relation of Fractice Teaching Grades or Ratings to 

Scholarship ........... . 72 

21 Relation of Practice Teaching Grades or Ratings to 

Teaching Effectiveness in the Field 7 4 

re Men of Scholarship to Teaching Effectiveness in 

the Field . 76 

23 Age and Experience as Related to Teaching Effectiveness < . . < 80 

24 Teaching Experience of Military and Civilian instructors 

in Air Force Technical Schools • Q) 

25 Relation of Scores on SubJect»Matter Tests to Measures 

of Instructor Effectiveness ..... 85 

26 Relation of 3 core 9 on Professional Information Tests 

to Measures of Instructor Effectiveness ..... . 86 

2? Relation of Extracurricular Activities to Instructor 

Effectiveness ....... ..... 89 

\ 28 Relation cf Scores on the Cooperative General Culture 

Test to Measures of Instructor Effectiveness . . 91 



Table 



List of Tables (Cont. ) 



Page 

29 Sex of Instructor as Related to Instiuctor Effectiveness , , . , 94 

30 Relation of Scores op Measures of Teaching Aptitude to 

Teaching Effectiveness 97 

31 Reletion of Interest Test Scores to Teaohlng Effectiveness , , , 100 

32 Relation of Ratings of Voice and Teaching Ability 102 

33 Opinion Studies of Traits, Qualities, and Characteristics 

of Successful Teachers 109 

34 The Five Most end the Five Least Important of 46 Teacher 

Traits as Ranked by Four Oroups of Judges , • 110 

33 Relation of Personality Measures to Measures of Inctruotor 

Effectiveness ... .................... . 115 

36 Relation of Social Adjustment Measures to Measures of 

Instructor Effectiveness ............. 118 



IDENTIFYING THE EFFECTIVE INSTRUCTOR? 
A REVIEW OF THE QUANTITATIVE STUDIES 
1900-1952 



INTRODUCTION 

The equipments of modern warfare are highly technical, Successful 
proseoition of a war demands that thousands of young men he able to main- 
tain and operate electronic and mechanical devices that are often extremely 
complex. Since these men, upon induction, do not have the skills end knowl- 
edges necessary to such tasks, the armed forcep are required to establish 
substantial training programs aimed at making satisfactory technicians out 
of raw recruits, 

Fast and effective training requires at its core skilled instruction, 

The problem of how to select personnel who can successfully accomplish this 
accelerated lnstruc' lonal Job is thus cruoial to the armed forces, Methods 
of training these potential instructors most rapidly and efficiently must 
also be developed. Research in the area of selection and training of in- 
structors has, therefore, very high probability of payoff in terms of a more 
efficient military organization. The first step would appear to be that of 
determining wh; t is now known concerning the problems involved. 

While the research literature was ieing surveyed as background material 
it becaoie apparent that a aumary of the findings of the quantitative studies 
had potential value for anyone concerned with instructor selection end train- 
ing problems, not only in the Air Force, but also in the other services and 
in civilian institutions, schools, and colleges, With these wider implica- 
tions in mind, t comprehensive end critical review of pertinent research re- 
porta he 8 been prepared, 

Over the pait fifty years a considerable literature has been built up 
concerning the pi ob leas associated with te icher effectiveness, Many of the 
articles that ha ,f e appeared merely reflect expressions of opinion in the 
form of "a"«achaii ’ analyses of teaching. Others, often written by the orig- 
inal investigators, deal with theoretical crf.si derations arising out of re- 
search studies. Undoubtedly many of these general discussions ere worthy of 
attention. Inasmuch aa the more pregnant theoretical implications usually 
form an integral part of reports of actual research investigations, it was . 
decided to include in this review only those studies that involved a quanti- 
tative attack on problems concerned with teaching effectiveness, Some ex- 
ception was made in the case of a few of the most recent theoretl.al discus- 
a ions by leading investigators in Die field. Limiting the scope of the 
review in this manrer reduces the bulk of material to be handled without 
seriously limiting the analyses of the problems of assessing teaching effec- 
tiveness or neglecting the progress that has been made in solving. these 
problems, 
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In the search for quantitative studies over 900 references were examined. 
Of these, over ?60 were abstracted for inclusion in the review, To obtain 
these references the following sources were used; Educational Index , Psy - 
chological Abstracts , and some 4o reviews and bibliographies, including the 
comprehensive Domes -Tiedeman (380) bibliography. A selected list of 28 of 
these reviews and bibliographies is included with the references accompany- 
ing this report. While no assurance can be given that all important research 
on instructor effectiveness has been covered, the reviewers had available 
the extensive facilities of the library of the University of Illinois as well 
as other sources of information, 

Findings are presented as given in the original reports, even though in 
some cases the research designs are obviously faulty, or insufficient num- 
bers of subjects have been used to allow statistically significant generali- 
zations to be drawn, The discussions of research studies and the tabular *" 
material are presented chronologically under each topio heading, except in 
a few instances where some specific feature of the investigations is empha- 
sised (e.g., in Table 30 order of presentation is chronological for each 
test), The chronological order enables the reader to judge results in terms 
of the tendency in later work to use more precise statistical methods, im- 
proved research designs, and to report more metioulously the conditions 
under which an experiment was conducted, 

An attempt has been made to include in the tables all information con- 
sidered necessary for interpretation of results. In the column describing 
the samples used in the various studies, besides the size of the sample, 
level of teaching position is stated wherever known. Other data on which 
a sample was selected are also given, such as: the sample was a dichotomous 

one of good-poor teachers, or, it was composed of only inexperienced teach- 
ers. In cases where this additional information is not included, it mey be 
assumed that the sample was indeterminate except for the particular variable 
cited, 

From the arrays of results that have been assembled, the reviewers have 
set down what appeared in their opinion to be the most probable generaliza- 
tions arising from the data and have drawn certain conclusions from these to 
serve as a guide in Air Force technical training research projects. It is 
anticipated that these facts end conclusions may also assist other investi- 
gators in research planning in this field. 



PRINCIPAL FINDINGS OF CITED RESEARCH STUDIES 
Criteria 



The main findings of the quantitative studies reviewed in the present 
report will be summarized. 
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Surveys of nating devices . Surveys of appointment blanks and rating 
scales in use have failed to provide means for identifying the significant 
items to be used in setting up instructor rating devices. The most fre- 
quently mentioned qualities on existing teacher appointment blanks are abil- 
ity to discipline, ability to teach, scholarship, and personality, There 
is no general agreement as to what constitutes the essential characteristics 
of a competent teacher. Similarly, items on present rating scales tend to 
be subjective, undefined, and varied, there, being no oonslstenoy as to what 
traits a supervisor might be expeoted to observe and evaluate. 

Administrative ratings . Administrative over-all opinion constitutes 
the most widely used measure of instructional competence. Available studies 
show in general that teachers can be reliably rated by administrative and 
supervisory personnel (usually with r's of .70 or above). For the most part, 
administrative ratings do not produce very high coi'relations with measures 
of student gain, Intercorrelati one of rated traits or categories appear to 
give evidence that traits which ,.ve more objectively observable or are more 
independent of opinion tend tr be Jess prone to logical error or halo effect 
than are those traits which arc more intangible and hence more subjectively 
estimated. The implication seems clear that by and large ratings made by 
the same person are apt to be contaminated by halo and that in many such in- 
stances a single rating of over-all effectiveness may be as useful as an 
evaluation based on a composite of a number of ratings of separate traits. 

Peer ratings . Peer ratings have been little used, For administrative 
purposes they are probably not too useful since teachers have certain mis- 
givings about passing Judgment on fellow teachers. From a research stand- 
point in using peer opinion, ranks will probably give better results than 
ratings. There is considerable agreement between supervisors and fellow in- 
structors in ratings of instructors. As in the case of administrative rat- 
ings, considerable corr iation is found among ratings given different trait9 
by the same peer raters. That is, halo influences peer ratings Just as it 
does administrative ratings. 

Student ratings . The use of student ratings of instructor effectiveness 
appears to be growing. Such ratings tend to show fair consistency, their 
reliability, as with other ratings, increasing with the number of ratings 
pooled in fairly good accordance with the Spearman-Brown formula. When stu- 
dent ratings have been compared with other measures of instructor effective- 
ness, rather diverse results have been found depending in part upon the cri- 
teria employed. Considerable halo effect is usually found when students 
rate their instructors on several traits. Whether or not grades received by 
students affect their ratings apparently depends upon the instructional sit- 
uation, Results may indicate that if the instructor favors the brighter 
students he will be approved by them and a positive correlation between stu- 
dent ratings and grades will result. If he teaches for the weaker students 
he will be disapproved by the brighter students and a negative coefficient 
will be obtained. By and large such factors as size of class, sex of stu- 
dents, age or maturity of students, and intelligence or mental age of stu- 
dents seem to have little bearing on student ratings. Research hes been too 
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sporadic and results too Inconclusive to allow generalizations to be made 
concerning the influence on student ratings of other factors such as age 
and sex of teacher, length of students' acquaintance with the teacher, 
length of time teacher has taught in the school or taught a student, pleas- 
urable personal relationships between student and teacher, and whether or 
not subject taught by rated teacher is students' favorite subject. There 
is considerable expressed opinion but little redearch evidence that student 
ratings will contribute to instructor improvement or could be used to im- 
prove supervisory ratings. 

Self-ratings, While there is some tendency for instructors to overrate 
themselved, self-ratings show negligible relationship with administrative 
ratings, student ratings, or measures of student gains. On the basis of the 
few available studies of self-ratings of instructors, the obvious, undis- 
guised self-rating technique would seem to offer little encouragement for 
evaluative or research purposes, 

Systematic observations . Systematic observation techniques to deter- 
mine differences in performance of effective and ineffective instructors 
have been largely neglected in research in the instructor area. Most of 
the observations made have been dependent upon the subjective Judgment of 
the observer, In general, the reliability of planned observational record- 
ing compares favorably with other methods of instructor evaluation, The 
most general criterion of validity of observation has been face validity. 

No single, specific, observable teacher act has yet been found whose fre- 
quency or per cent of occurrence is invariably significantly correlated with 
student achievement. There seems to be some suggestion, however, that ques- 
tions based on student interest and experience rather than assigned subject 
matter, the extent to which the instructor challenges students to support 
ideas, and the amount of spontaneous student discussion may be related to 
student gains, Apparently there are no optimum time, expenditures for par- 
ticular class activities; a good instructor may function successfully with- 
in a wide range of time expenditures, A factor analysis of a number of 
instructor end student behaviors resulted in three factors: (a) understand- 

ing, friendliness, and responsiveness on the part of the instructor, (b) 
systematic and responsible instructor behavior, and (c) the instructors' 
stimulating and original behavior, 

Student gains. Of the several methods used to measure student change, 
residual student gain, that is, the difference between actual gain and pre- 
dicted gain, is becoming more widely used as a criterion of instructor effec' 
tiveness. With all its difficulties it appears to offer one of the best 
criteria thus far used, As compared with commonly reported test reliability 
coefficients those obtained in gains studies have been low. The great dis- 
crepancies in the findings of investigators who have examined the student 
gains criterion emphasize the extreme variability in relationship with other 
criteria used to indicate instructor ability. Within the limits of meas- 
ures so far used, the relationship between administrative opinion of an 
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instructor '9 competence and the amount of subject matter that the instruc- 
tors will impart to his students cannot be predicted. 



Predictors 



Intelligence . Whether or not intelligence is an important variable 
in the success of the instructor apparently depends upon the situation. 

In general there appears to be only a slight relationship between intelli- 
gence and rated success of an Instructor. Correlation coefficients for 
high school teachers tend to be somewhat higher and somewhat less variable 
than those reported for elementary teachers. For all practical purposes, 
however, this variable appears to be of little value as a single predictor 
of rated instructor competence. 

Education . Considered as a group, the investigations of semester hours 
or years of education as rel^+cd to instructor efficiency have indicated 
that any relationship that u-'j exist is slight. Beyond certain more or less 
obvious knowledge requirements, greater or lesser education of a teacher in 
terms of courses or semester hours seems to be unimportant in discriminat- 
ing between good anr3 poor teachers. 

Scholarship . Implications of studies reviewed with respect to scholar- 
ship are quite clear. Grades a student will obtain in a practice teaching 
course way to some extent be predicted by the grades that student obtained 
in college. Accurate prediction of success in practice teaching, however, 
cannot be made on the basis of an individual's scholastic record in high 
school. Almost all available studies report low positive correlation co- 
efficients between measures of on-the-job performance of teachers and ear- 
lier scholarship as reflected in over-all achievement in high school or 
college, or in standing obtained in specific college courses (including 
practical teaching courses). There appears to be acme relationship, but 
it is small. No investigator has shown that the attainment of a particular 
standing in high school or college or the mastery of any single course or 
group of courses is essential to teaching competence. The positive corre- 
lation coefficients usually found probably reflect primarily the relation- 
ship of general intelligence to both academic and teaching success. 

Age and experience . It appears that a teacher's rated effectiveness 
increases at first rather rapidly with experience and then more slowly up 
to five years or beyond. There is then a levelling off, and the teacher 
may show little change, in rated performance for the next fifteen or twenty 
years, after which, as in most occupations, there tends to be a decline. 

Knowledge of subject matter . Whether or not knowledge of subject mat- 
ter is related to instructor competence seems to be a function of the par- 
ticular teaching situation. Some studies suggest that too much knowledge 
on the part of the teacher may result in teaching "over the heads" of stu- 
dents. 
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Professional information , Scores on tests of professional information 
appear to bear some slight relationship to supervisory ratings or rankings 
of instructor competence. Contradictory results have been obtained, however, 
when such scores are correlated with pupil gain. 

Extracurricu l ar activities . In general, investigators have found low 
positive relationship between an individual's participation as a student in 
extracurricular activities and his later instructor effectiveness, 

General Cultur e. Studies reviewed appear to indicate that the relation 
of Cooperative General Culture Test scores to instructor effectiveness dif- 
fers little from those reported for other subject matter teste, 

Sooioeconomio status . Studies of the relationship of socioeoonomio 
status (as measured by such devices as the Sims Socio-Economic Scales) to 
criteria of instructor effectiveness show little, unless it is that those 
from higher status groups have greater probabilities of success in life 
than those lees fortunate. 

Sex , No particular differences have been shown when the relative ef- 
fectiveness of men and women teachers has been compared, 

Marital status , Despite some prejudice to the contrary there appears 
to be no evidence that married teaohers are in any way inferior to unmarried 
teaohers. 

Teaching aptitude . Results obtained from measures designed to predict 
teaching ability show great disparity, Data thus far available either fail 
to establish the existence of any specific aptitude for teaching with any 
degree of certainty or indicate that tests used were inappropriate to its 
measurement, 

Teaching attitude . Attitude toward teachers and teaching as indicated 
by the Yeager Scale devised for its measurement seems to bear a email but 
positive relationship to teacher success measured in terms of pupil gains. 

Interest in teaching . In most of the studies reviewed, interest in 
teaching we 3 measured by interest test scores which indicated similarity of 
interest of teachers and persons undergoing the interest test, "Correlations 
resulting from the use of several standard interest tests either cluster 
around zero or are so inconsistent as to render such tests of rather doubt- 
ful value as predictors of teaching success. The common factors that were 
found thrcugh factor analyses to underlie the reasons given for choosing the 
teaching profession are perhaps provocative of further research but were 
based on too few cases to justify any clear-cut interpretation. 

Vole: a and speech character 1 st I cb . On the basis of studies reviewed, 
in gener< 1 , it appears that the quality of the teacher's voice is not con- 
sidered ^00 important by Bchool administrators, teachers, or students. In 
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one study, however, certain speech faotore were found to be correlated sig- 
nificantly with student gains and with effectiveness ratings of supervisors, 
The interoorrelations of the speech factors, however, were so high that 
general speeoh ability based on a single feotor is probably as useful as a 
composite of judgments based on several speeoh faotore, 

The photograph . Studies of the use of the photograph as a prediotor 
of instructor effectiveness have failed to demonstrate that photographs 
have any predictive value, 

Statistical analyses of instructor abilities , Such instructor factors 
as empathy, professional maturity, general knowledge, mental ability, social 
adjustment, and the like have been identified through factor analyses by 
various investigators. The statistical analyses so far reported, however, 
suffer from inadequacies of criteria, testing instruments, or number of 
cases. 

Opinion studies of Instructor personality characteristics . The attempts 
made to' identify characteristics of successful and unsuccessful instructors 
by making lists of traits tased on opinion appear largely sterile in terms 
of usability for evaluation or selective purposes. 

C auses of teacher failure . In most of the studies of unsuccessful 
teachers poor maintenance of discipline and lack of cooperation tend to be 
found as the chief causes of failure. Healttj, educational background, train- 
ing, age, and knowledge of subject matter, on the other hand, appear to be 
relatively unimportant factors in terras of teacher failure, 

Personality teats . Results obtained with personality tests of teach- 
ers have shown wide variation when correlated with other measures. Some 
so-called personality tests appear to show significant correlations with 
certain measures of instructor effectiveness. Until carefully controlled, 
well-designed studies employing adequate numbers of instructors have been 
made, however, the problem of determining the personality patterns of ef- 
fective teachers must still remain unsolved. 



CRITERIA OF INSTRUCTOR EFFECTIVENESS 

By common definition a criterion is any standard used for Judging, For 
the scientist, however, such a definition is inadequate, A criterion which 
is to be used for scientific Judgmsnts cannot be Just any standard, It 
should be the best possible standard for the particular class of Judgments 
that are to be made. This means that the soientiot must be able to Justify 
his choice of a criterion by demonstrating its logical relevance to the prob- 
lem at hand and by showing that it possesses measurement characteristics 
which are technically adequate, 




7 



So long as the investigator restricts his research to laboratory stud- 
ies the establishment of a justifiable criterion usually presents no great 
difficulties, A criterion for memory, for instance, may be the recitation 
without error of a list of nonsense syllables, or the criterion of learning 
may be a specified minimum of blind alleys a rat enters while traversing a 
maze. The moment research is moved into less rigidly controlled life situa- 
tions, however, the investigator is confronted with or iter ion problems which 
are seldom simple and often impoosible of completely adequate solution. The 
determination of a scientifically Justifiable criterion of instructor ef- 
fectiveness presents such problems. 

Every educational system and every training program has certain goals. 
The first requirement for choosing a criterion of Instructor effectiveness 
is that these goals be defined. The measure of a particular teacher's ef- 
fectiveness is then the extent to which that teacher facilitates the stu- 
dents' progress toward these goals. Since in any system there are usually 
several educational goals, a measure appropriate to each goal is indicated, 
The construction of a single, over-all criterion of instructor effectiveness 
would require that these various measures should be weighted into this cri- 
terion in accordance with supportable value Judgments as to their relative 
importance. 

Obviously, the fulfilling of the requirements for such a criterion of 
instructor effectiveness is a large order. The comparative student changes 
that would require measurement in certain educational systems, or at cer- 
tain stages in a particular curriculum, might quite defensibly include suoh 
aspects as: changes in knowledges of specific subject matter, improved suc- 

cess in subsequent schooling, improved personal adjustment, or increased 
success in life. It is conceivable, also, that the effective teacher con- 
tributes to ohanges in other teachers' pupils through individual guidance, 
assistance in planning the school program, good influence on group morale, 
and the like, thus creating effeots that cannot be isolated, or ascribed to 
any one teacher, 

In. the studies reviewed, the criterion problems have been handled with 
widely varying degrees of sophistication, Measures found acceptable as cri- 
teria of instructor effectiveness by one investigator are often considered 
as unvalidated potential predictors by others. In order to provide for com- 
parisons among studies and for appraisal of researoh progress,’ the reviewers 
have grouped together what appeared to them to be comparable studies, The 
basis for these groupings rests on the use by the investigators of similar 
criteria, or where no measures appeared to merit designation as a criterion, 
of similar potential predictors. 

The largest grouping covers studies in which ratings or rankings of 
teachr.rs have been used as criteria, Most commonly the reporting investi- 
gator does not deal explicitly with the problem of the relevance of such 
criteria to teacher effectiveness. In the opinion of the reviewers, if one 
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ia concerned with teacher effectiveness as the changes brought about by the 
teacher in the teacher's own pupils, then ratings and rankings are less rele- 
vant than either measures of student change or cor, trolled observations of 
student behavior. Ratings are someone's estimate of the effects on students 
of those teacher characteristics the rater happened to observe, and which 
he deemed important. Without demonstration that these estimates have re- 
lationship to student achievement, they cannot really be considered as sat- 
isfactory substitutes for measures of pupil change, On the other hand, if 
one is" considering that part of teacher af fectivenei a which the teacher 
contributes to the growth of all pupils by participation in the efforts of 
the educational group, then ratings or rankings woult seem to be somewhat 
more relevant, In this latter case the influence of the teacher is a func- 
tion of the quality of the teacher's relations with students in general, 
with other teachers, supervisors, and the community. Differential effec- 
tiveness is a matter of differential contribution to tns over-all goals of 
the school or educational system, Since such contribi -ion is almost in- 
evitably in a cooperative setting, and since its effects are diffuse and 
(almost certainly) unmeasurable, there would appear to be logical justifi- 
cation- for an attempt to get estimates of effectiveness in this area by the 
use of ratings or rankings obtained from other people in the educational 
situation. 

Another section covers studies in which observational measures of 
teacher performance have been used, It is plausible that changes in stu- 
dents should be related to vha + the teacher does and how he does it, Fur- 
thermore, it seems reasonable that careful and objective observation of 
the teacher's, behavior in the teaching situation could provide a measure of 
the teacher's effectiveness. A number of investigators have thU3 attempted 
to achieve objectivity in a criterion by the use of observational measures 
of teacher performance. However, before any method of objectively evaluat- 
ing effective performance on the part of a given teacher can become useful, 
such method must be proved to be capable of measuring kinds of teacher be- 
havior related to the type and amount of change the teacher produces in 
her pupils. 

Studies that used measures of pupil change as a criterion are also 
grouped together. Granting that many of the pupil changes that would in- 
dicate a teacher '8 effectiveness are in behaviors that are not measurable, 
or at least have not yet b^en measured, there is at least one area in which 
measurements have been made. This is the area of student changes in knowl- 
edge of subject matter. While adherents of various educational philosophies 
might disagree as to the importance of changes in subject matter knowledge 
relative to other kinds of desired changes, it seems probable that all would 
agree that such changes have some importance and that they are relevant to 
the problem of teacher effectiveness. 

The last section of the review covers other instructor or student vari- 
ables or measures that were included in the studies read. These the re- 
viewers have classified as "possible or potential predictors" regardless of 
what they were designated by the original authors. They are so classified 
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"because many of them, If thteir correlation vith an adequate criterion of 
instructor effectiveness could be demonstrated, would be useful in the se- 
lection of personnel "for teacher training or for assignment to teaching 
positions. Within the section the various classes of potential predictors 
or correlates are placed together to allow comparisons to be made and, where 
possible, conclusion's to be drawn. 



Bating the Effectiveness of Instructors 
An Appraisal of Instructor Rating Methods 

In attempts to evaluate instructors systematically many kinds of rating 
methods have been used (361) . In a great many of the studies reviewed, In- 
vestigators adopting rating as a criterion of teaching effectiveness have 
accepted the rating scale or method in use in a particular school situation. 
The types of rating scales which have been most favorably received by school 
administrators are the graphic, the check list, and to a lesser extent the 
rank order or order of merit. Consequently, these scales account for nearly 
all of th’e studies using rating as a criterion. In a few studies, however, 
the paired-comparison, oritical- incidents, or forced-choice type of rating 
scales have been used. 

The reason for the varying degrees of popularity of the different types 
of rating scales for administrative use is obvious. Ease of administration 
plus assurance that the administrator can follow hie subjective leanings 
appear to have been the factors given the greatest weight in the choice of 
a rating method. 

Since the results obtained in rating teaching effectiveness depend in 
part on the adequacy of the methods used, a brief appraisal of some of the 
more usual. methods that have been applied to instructor rating seems appro- 
priate to the purposes of this review. 

The graphic rating scale is simple, comprehensible, easy to administer, 
free from direct quantitative terms, and discriminates as finely as the 
rater desires, It is also very susceptible to leniency effects. 

The check list, on superficial appraisal, appears to be a simply con- 
structed device though it is cumbersome to administer. To achieve a tech- 
nically sound instrument,- however, it is necessary to d 6 more than Just 
compile a collection of random statements. A thorough Job analysis should 
be undertaken and as with other rating methods, comparative evaluation must 
be made of the various behaviors to discover those elements which determine 
good and poor instructors. 

The rank-order technique while offering a simple means of evaluating 
instructors, laoks the popular appeal of the above two methods. From a 
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research point of view its chief drawbacks are that it does not indicate 
the magnitude ot the differences between persons rated nor does it indi- 
cate the differences between groups. This device has sometimes been used 
to validate other methods. 

Although some investigators have claimed that the paired-comparison 
technique tends to be more accurate than rank-order or rating-scale methods, 
it has lacked favor among administrators, first, because it is extremely 
time consuming and laborious especially when used in rating large groups 
and, second, because there is usually a very high correlation between paired 
comparisons and rankings. It is also somewhat more resistant to manipula- 
tion by the rater than rank-order or graphic rating scales, Some investi- 
gators have recommended this device as a criterion of validity against 
which less rigorous methods of rating may be checked, 

The search for more stable rating methods has led to the development 
of the critical- incidents and forced-choice techniques. Of these the 
forced-choice technique appears to be the more promising. The unique fea- 
ture of this technique is that it limits the rater's control of the final 
result of his rating, thus effectively reducing biasability (272), Limit- 
ing the rater's control helps also to counteract another weakness usually 
associated with rating, that is, the raters tendency to become more end 
more lenient with repeated ratings. Nonbiasability effectively minimizes 
the effects of this ohanging frame of reference on the pert of the rater. 

The critical- incidents method, devised by Flanagan (113, 11*0; was 
developed os a means of identifying the Important and valid behaviors on 
which rating should be made, So for it has not shown much promise in the 
rating of instructors, Domas (104) and Jensen (167) in attempts to use 
this method in school situations have demonstrated, perhaps unintentionally, 
the principal weakness of the method, When the "oritical incidents" have 
been collected some attempt must be made to organize them so that they may 
be used conveniently, The resulting categories appear, however, as a list 
of vague generalities which might have been Jotted down without going through 
ell the elaborate prooess of aooumulating the incidents. After Domas bad 
collected 1000 and Jensen had assembled 500 oritical incidents, they found 
they were unable to fit them into categories except as they represented ef- 
fective or ineffective behavior and so presented them in their reports, 
Charters and Waples (74), incidentally, encountered the same difficulty 
when they attempted to organize lists of characteristics essential for suc- 
cessful teaohlng, Another principal weakness of the oritioal-inoidents 
technique is that it depends entirely on the conception of effectiveness 
held by those who report the incidents, In applying the technique to teach- 
ing, its validity depends on the opinions of effective teaohlng held by the 
particular superintendents, teaohers, students, or others from whose re- 
ports incidents are sought, 

High reliability in terms of agreement among raters depends upon pre- 
cise definitions of traits being rated so that raters have a oommon under- 
standing of what is being rated, and sufficient frequency of occurrence of 
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the behavior, trait, or quality so that systematic, extensive observations 
may be made, Wrightstone (561) reports stud in by several investigators 
which tend to show that the following traits can be more reliably rated: 
efficiency, originality, perseverance, quickness, Judgment, energy, scholar- 
ship, leadership, and intelligence. Such traits as courage, selfishness, 
cheerfulness, kindness, Judicial sense, and tact proved not to be so reliably 
rated. These findings are perhaps specific to the raters, the rating scale 
used, the ratees, and the situation. It should be pointed out that it is 
doubtful if such literary traits as those exemplified here can be suffi- 
ciently well-defined to be useful nor can they be agreed upon by different 
raters, except perhaps as they uniformly reflect halo from an agreed on 
reputation, Asch (ll) has shown that the content and functional value of 
a trait changes with the context of other traits. Gaining an impression 
of another person is not a process of fixing each trait in isolation and 
noting its meaning but rather p. summation of the effeots of these traits. 

For this reason it is probably more accurate to Judge whole impressions 
rather than artificially isolated traits, Carefully planned studies, how- 
ever, might well enabl ? predictions to be made as to what types of traits 
and behaviors can be more reliably rated than others. 

The reliability and validity of ratings tend to be reduced by several 
sources of error, Among these should be inoluded Judgments based on insuf- 
ficient evidence, .laok of training of the rater, and poor rating devices. 
Subjective rating scales depend largely upon memory and therefore are eub- 
Jeot to errors by forgetting. 

Another souroe of error lies in the faot that none raters tend to over- 
rate and some to underrate, while still others tend to rate everyone near 
the Middle of the soale, Thus, ratings made by different raters may refleot 
differences in rating habits rather than differences among the people rated, 

Perhaps the greatest sources of error are those of "halo effect," first 
noted by Wells (349), and "logioal error," Halo effect is the tendenoy of 
the rater to rate one trait or ouallty hlfdi (or low) beoause another trait 
or quality has been rated high (or low) or because the rater knows that the 
individual rated excels (or is particularly weak) in some reepeot. Logical 
error arises from presuppositions in the minds of the raters and laok of 
definiteness of the trait being rated, 

Eatings also tend to become more and more meaningleee with repeated 
use. This is well illustrated by the results of repetition of the same 
scale in rating Army officers. In 1922, 25$ of Armjr oaptalns were rated 
as excellent; by 1940 the percentage had reaohed °'/<>; while in 1945, 95$ 
of oaptalns received an excellent rating (13). Inoreaaed lenienoy with re- 
peated ratings is probably not direotly a funotion of the type of rating 
soale but rather due to the operation of sooial and situational pressures. 
With repeated ratings there tends to be a ohanging ‘frame of reference on 
the part of the raters, It ehould also be noted that lenienoy tendenoy ie 
not as serious a dravbaok under rssearoh conditions as oontrasted with op- 
erational conditions, 
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The reliability of rating scales is inoreased by pooling the racing of 
several Judges, An shown in findings reported by Bryan (6l), Remmere et al, 
(270), and others, reliability in ratings inoreases with the number of rat- 
ings pooled in fairly good accordance with tfie Spearman-Brown formula, 

Bendig (29). in a study in 195 2 of inter-Judge versus intra-Judge reliability 
of the order- of-mer it method found the relationship between these two types 
of reliabilities to be U-shaped, The groups ‘of Judges with the most highly 
reliable and most highly unreliable intra-Judge -reliability showed the most 
group agreement, Furfey inWrightstone (?6l) showed that reliability was 
increased also by subdividing traits and having ratings made on the sub- 
traits. 

In the next seotions the results of studies dealing with ratings made 
by administrators, fellow teachers, the teacher himself, and students are 
reviewed. In interpreting the results of these studies themnny sources of 
error in rating methods must be constantly borne in mind. By and large in- 
vestigators have tended to ignore the problems of correcting for the various 
sources of error and have worked with ratings as though they were already a 
perfected criterion. 



Surveys of Types and Content of Scales 

In an attempt. to determine what characteristics of instructors ore con- 
sidered desirable or essential by authorities in the field of education, 
several studies have been made of appointment blanks or rating scales as 
used by teacher-training institutions, university departments of education, 
or state departments of public instruction, In most cases the procedure 
consisted of collecting the forms used, tabulating the items on the rating 
sheets,, and determining the total frequency a given trait or quality was 
mentioned on the rating devices used by all the institutions surveyed. 

In 1920 Os burn (250) attempted to determine the desirable personal ciiar- 
acte?’istics of the teacher by studying appointment blanks used by 121 teacher- 
training institutions. The outstanding finding of this investigation was the 
lack of agreement as- to what constitutes the essential personal characteris- 
tics of a competent teacher. The universities tended to be in somewhat closer 
agreement than the normal schools. Ability to discipline, ability to teacji, 
scholarship, and personality were the most frequently mentioned qualities, 

A oritical analysis of rating sheets in use for rating student teachers 
in institutions of the North Central Association of Secondary Schools and 
Colleges was made by Smith (3l8) in 1936, Of the 128 institutions replying 
to a request for information, 103 made use of some form of rating sheet. Ap- 
proximately 7756 of these depended solely upon persona] opinions of the raters, 
In 19^1 Samuelson (291) reported a survey of rating scales in use In approxi- 
mately 50 teachers' colleges and schools in 29 states. The investigator's 
chief finding vie the variety of practices arid me'thods of measurement employed, 
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Graphic scales, uam ly with five-point scale division, predominated, al- 
though descriptive scales, letter scales, and numerical scales were also 
used, In 19^0 Schellhammer (29M also examined rating procedures In 109 
teacher-training Institutions, The f^rrns, he found, varied from a single 
rating of 9 Items to a comprehensive scoring of 72 items on a seven-point 
scale. Intelligence and health Items appeared most frequently, with no 
other item appearing more than 11 times. This seemed to Indicate that there 
Is no general agreement as to which characteristics the supervisor might be 
expected to observe and evaluate, Petersen and Cook (255) in 1930, Dean 
(99) in 1939, and Wocllner (358) In 19^1 also surveyed rating procedures 
used in teacher-training Institutions, 

Barr and Emane (19), In 1930, In order to determine vhnt qualltleo 
are prerequisite to success In teaching, studied 209 teacher rating scales 
collected from oltles of more than 25,000 population, from state departments 
of publlo Instruction and from university departments of education, They 
reported that the 6939 Items found In the rating scales tended to be highly 
subjeotlve and undefined, The scales also varied widely In content and or- 
ganitation, many being either quite superficial or apparently representing 
opeolal points of view or systems of teaching. 

In 19^5 Reavie and Cooper (262) surveyed rating methods in uue in 123 
olty school systeme, They reported that the most notable character istio of 
the rating devices employed was their laok of uniformity, The Instrument 
varied In type, In number of items to be rated, In speoiflo ohareotcrletlos 
included, end in individual responsible for the rating, In one city teeoh* 
era were ’rated "only by degrees hold," A total of 1538 Items were tnoluded 
In the scales used. Of these only 256 appeared on more then one devioe, 

It would seem that the survey method might provide an obvloua way of 
determining the significant items to be used In setting up Instructor rat- 
ing devices. The studies s unmerited, however, appear largely sterile, The 
meaningless sort of results obtained are probably due to the failure of the 
surveyors to develop a rationale which oould be imposed on the materials 
surveyed, The reliability of the categorising of descriptive terms for 
traits or characteristics would hsve to be tested, dingle Judgsenta, or even 
Judgaenta based on a group of oloaely associated Judges, would not suffice. 
Rather, agreement should be tested for fitting the categories Into the ra- 
tionale by a aeries of Independent Judges, much in the same way that the re- 
liability of the categorisation of behavlora by independent Obaervera la 
studied In tlme-aampllng studies. Such surveys of content ere not apt to 
produce results worth the effort until, through empirical or other means, 
hypotheses concerning what teaching character la tics should be rated tre first 
formulated end then these hypotheses are cheoked by reference to Institu- 
tions! practice, 



Types of Raters 

Rating devices not only differ In form and content but they are also 
designed to be used by different classes of raters, An Instructor's 



competence, for instance, may be rated by his supervisor or by an outside 
expert, by his fellow instructors, by his students, by himself, or by some 
combination of these. Most instructor ratings heretofore have been made 
by administrative personnel, but in recent years studeat ratings of their 
instructors have been receiving more and more widespread use. 



Administrative Rating of Instructor E ffectiveness 

As has been repeatedly shown by surveys, many school systems employ 
unstructured rating procedures, the most widely used measure of an in- 
strudtor’s competence being the over-all opinion of the principal, super- 
visor, superintendent, or school inspector. On the basis of Judgment of 
such administrative personnel, instructors may be selected, hired, pro- 
moted, or fired. To the best of the reviewers 1 knowledge, a rating form 
for teachers was first used administratively in Milwaukee in 1896 (l?0). 
By 1903' school systems in a number of other cities were also using rating 
forms. 



Demonstrated lack of agreement among administrators, however, end 
the undepencfable nature of subjective opinions in general hav6 led t*. 
frequent attempts to put instructor rating on a sounder footing throd£ v 
the use of more analytic administrative rating .devices* One of the 
earliest attempts to quantify instructor behavior was the tenta' ive set.-. .ie 
for the measurement of teaching efficiency Outlined by Elliott i.\C6) in 
1910. He based hi.’ method on tho premise that the teacher was an "octo- 
personality" — executive, projecting, supervising, professional-technical, 
social, physical, moral, and dynamic. 

Investigations of the reliability, validity, and halo effect of ad- 
ministrative ratings utilizing rating devices will be examined ir. this 
report. 



Reliability of Administrative Rating of Instru c tor Effectivene ss. 

Reliability can be measured (a) between raters, (b) for a single rater 
from one rating scale or item to another (which may reflect halo effect), 
and (c) between ratings by the same rater from one occasion to another. 

The available studies appear to show that teachers can be reliably rated 
by administrative and supervisory personnel, the preponderance of relia- 
bility coefficients reported being »70 or above. As shown in Table 1, 
there is considerable variation, coefficients of reliability for rated 
general effectiveness ranging from .17 to .98. When traits or qualities 
other than general ability are rated, the reliabilities tend to be Seme- 
what lower than those found for general effectiveness (Barr (16), Board- 
man (39) 1 . Part of the range of reliability coefficients can be ascribed 
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to method. Where correlations vers obtained when the sane raters used 
different methods or scalesi the coefficients tended to be high, e.g., 
the £ of .96 is of this type. Where correlations were obtained between 
t\fO different raters using the same method, coefficients were confider- 
ably lower, e»g», the r of .32 in the Hamrin study (1/.3)* Hampton (lf»2) 
in her study of administrative ratings made in 1951 found that "correla- 
tions between successive trait ratings of the same persons were differed 
from aero at the one per cent level, trait by trait, when the raters were 
the same and nominally equal to zero when the raters changed." 

The reviewers found some confusion among authors as to whether re- 
liability or validity was involved in certain of the correlation co- 
efficients computed. When raters are of equivalent prestige, status, or 
standing, the reviewers have assumed that consistency cf ratings, i.e., 
reliability, is intended. Such studies are reported in this section. 

When raters are of obviously unequal prestige, of different closes, or 
the comparisons are with an entirely different order of criterion variable, 
the studies are included in the following section on validity. 
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In order to insure that rators were confronted with a common situation, 
Shiels (308;, in 1915, asked 110 principals to rate the seme ten case stud- 
ies of teachers for instruction and discipline on a five-point scale* Higher 
reliability would be expected than would be the case in rating real teachers 
since the judges wore basing their opinions on identical data* The ratings, 
however, showed considerable variation and range, there being no instance 
of IOC# agreement* There was, moreover, less than 75# agreement in all but 
four cases in rating inetuiction, and in all but two cases in rating dis- 
cipline* 

In Barr's study (16) similar results were found* l/hen 60 visiting 
superintendents observed, for two different periods of 30 min* each, the 
teaching of one teacher relatively unknown to them and then rated the 
teaching effectiveness of that teacher* groan divergence of opinion was 
found* The cupet lntendents spread then’ ratings on all traits over at 
least 9 points of a 10-polnt scale and for more than 50# of the items 
over all 10 points* One superintendent commented on the poorness of the 
teaching, while another remarked that he wished he could employ the ueacher 
in his school* Correlations between first and second observation by the 
superintendents also proved in general to be low* Barr stated that an 
outstanding fact brought out by this study was that supervisors cannot 
agree when asked to analyze a tetching situation about which they have no 
advance information* He conolude.* further that '‘conventional supervision 
is highly subjective," 



Correlation of Administrative Ratings with Other Measures of Instructor 
Effectiveness 

A number of Investigators over the past thirty years have made com- 
parisons of various criteria of instructor effectiveness* Their studies 
have been summarized in Table 2* The correlation coefficients, where re- 
ported, range from -.61 to *82, the former being determined by Jones 
(172) when he compared principals' ratings of 13 teachers with gains made 
by their pupils in English and the latter by Hannings (238) when he com- 
pared principals' with assistant principals' ratings of 15 high school 
teachers* In some instances rather substantial coefficients were ob- 
tained when ratings of various typss of administrators were compared 
te.g*, Brandt (51), Bryan (6l), Nanninga (238)» Tiegs (333) J • In these 
v and other cases where relatively high correlations were reported, 
opportunities for collaboration, prior discussion, or other sources 
of contamination of data were not completely ruled out* For the most 
part administrative ratings do not produce very high correlations with 
measures of student gains 'e»g., Brandt (51), Taylor (331)) • 

In Knudsen’s and Stephens' (l6o) analysis of 57 published devices 
for rating teaching, they discovert i that often the validity of the device 
was implied in the assumption of t ,e competence of its designers to select 
significant traits* Forty gave m statistical evidence of validity or 
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reliability; 11 mentioned correlations between ratings of the same' teacher 
by different judges; U quoted correlations of weights assigned to various 
items on a given device by different judges; 3 gave correlations between 
successive judgments of the same Judge; 2 included intercorrelations of 
scores assigned by different judges; and 2 mentioned correlations of scores 
on items with scores on general merit* 



Intercorrelations of Rated Traits or Categories 

Seme, if not all, of the studies reported in this section appear to 
give evidence of the presence of the halo offect which tends to bias rat- 
ings in general* A number of the investigators whose studies are reviewed 
here have called attention to this factor as at least a .partial cause of 
the large correlation coefficients found when ratings of several traits 
by the same rater are compared* Other investigators report high correla- 
tions without comment . 

In any interpretation of these studies- it is important to recognize 
that, for* instance, a coefficient .of *90 between rated "efficiency" and 
rated "use of^n.ethods".does not mean that good Methods lead to efficiency; 
it merely means that raters ter,d to rate a given person at the same rela- 
tive level on the two traits* it should be noted that two kinds of inter- 
correlation may indicate halo effect* The first kind is the correlation 
found when ratings of two traits by the same rater are plotted against 
each other. Tha second kind of correlation is that found when mean rat- 
ings of two ‘traits of sevei'al ’ instructors are plotted against eacn other. 

In fable 3 12 studies are eumoarized in which correlations were com- 
puted Tin the Bryan (61) and BrooRover (55) studies actual coefficients 
were not reported] between sane rating of general teaching merit and rat- 
ings on some other teaching characteristic where the two types of ratings 
were made by the same rater. It vili be* noted that,, in general, the co- 
efficients tend to be high, probably indicating operation of considerable 
halo effect* In some cases the relationships are quite as ridiculous as 
those Knight (178) found and contended on in his study of peer ratings. 
Knight obtained a correlation coefficient of •% between general teaching 
ability and intellectual ability and one of *79 between teachihg ability 
and skill in discipline when these were rated by .fellow teachers* He 
aleo found a correlation of .86 between ratings on skill in discipline 
and in'.ellectual ability. In pointing out the absurdity of these correla- 
tions, Knight said, 'Were this really the truth, what a prodigy of in- 
tellect the 'strict, 1 but ofteh d.ull,< teacher would be!'.' Further, "If 
we thus generalised, we would aleo hold that Grant, admittedly a past 
master in control, aleo towered above ’Lincoln in mental stAture." 

In the case of certain traite, however,- the correlation coefficients 
are low. For instance Ruediger arkf.StrAyer (283) report a coefficient of 
,CW* between general merit and health and .20 between general merit and 
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appearance; and Boyce (48) found a correlation of .18 between general merit 
and health i On the other hand Boyce reported a correlation of .90 between 
general merit and instructional skill. This suggests that traits which are 
more objectively observable or are more independent of opinion are less 
prone to logical error or halo effect than are those traits which are more 
intangible and hence more subjectively estimated. The implication seems 
clear that, by and larger ratings made by the same person are apt to be con- 
taminated by halo and that in many such instances a single rating of over- 
all effectiveness may be as useful at, an evaluation based on a composite 
of a number of ratings on separate trait3. 



Peer Rating of Instructor Effectiveness 

Apparently little use has been made of the practice of having teach- 
ers rate their fellow teachers. Roberts and Draper (279) in 1927 obtained 
material on the scope and character of the work of the principal from 
principals 1 reports from 441 high schools having an enrollment from 5 to 
4000 pupils in ail sections of the United States. Only 12 principals 
asked teachers to rate each other and 379 did not require such ratings; no 
answer to this question was given by the remainder of the principals. A 
survey made by Reavis and Cooper (262) in 1945 on rating methods in use 
in city school systems showed that in only two systems was teacher opinion 
used as part of the rating set-up. 

In a number of studies, however, lists of desirable traits of teachers 
have been compiled by teachers themselves (53# 120, 173# 215# 303). A de- 
tailed analysis of these and other related studies is included in the sec- 
tion on Opinion Studies of the Personality Characteristics of Effective and 
Ineffective Tr '?tructors. 

Superficially at least, the most obvious way to discover how a mar. 
doos a Job is to ask a fellow employes. It would seem that fellow-teacher 
opinion should provide a valid measure of instructor competence. The 
rating a teacher makes of a fellow teacher, however, is probably rarely 
based on first-hand observation but rests rore often on hearsay and repu- 
tation. Even if he does have opportunity to observe other teachers’ per- 
formance in the classroom, he may not know what is important to look for. 

Furthermore, peer ratines have never been popular. This is probably 
due to the dislike of persons to evaluate or to le evaluated by their close 
associates. The raters can never be absolutely certain that uncomplimentary 
opinions do not get back to the person rated, nor are they always sure just 
how their ratings will be used. They are loath, for instance, to accept 
any responsibility for separating even an incompetent fellow worker from 
his job. • 
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For administrative purposes, therefore, peer ratings of instructors 
are probably not too useful since teachers tend to have certain misgivings 
about passing judgment on fellow teachers# When obliged to rate their 
fellow teachers, they are apt to do what is popularly called a “snow job.'* 
They are careful to give only favorable ratings, thus avoiding any reper- 
cussions if t>eir ratings became known to the one rated. This means of 
course that the instructor-rater keeps his more candid opinions to him- 
self. From a research standpoint, in using peer opinion, ranks might 
give better results than ratings, especially if steps are taken to assure 
the raters of the anonymity of the results. 



Re liability of Peer Rating of Instructor Effectiveness 

Not many data were found on reliability of peer rating of teachers. 
Four studies in which reliabilities were obtained for fellow-teacher 
ratings are presented in Table In these studies the N used was the 
number of instructors rated. 



Correlation of Peer Rating with Other Measures of Instructor Effectiveness 

Several investigators have been interested in showing the relation- 
ship between peer rating and other measures of instructor effectiveness. 

The rationale for making such comparisons appears to be that of lending 
support to the validity of the measure used in a particular study. Appar- 
ently there is considerable agreement in opinions of supervisors and fel- 
low instructors. This would seem to indicate that the reputation of an 
individual is a cornnon element in influencing the Judgment of all who are 
associated with the teacher whether pupils, fellow teachers, or supervisors. 
* • 

In the four available studies where correlations were computed, the 
coefficients ranged from .53 to .96. These four studies have been sum- 
marised in Table 5, together with threo .reports where noncorrelational 
methods were used in comparing peer ratings with other measures of in- 
structor effectiveness. 



Intercorrolations of Feer Rating of Instructor Effectiveness 

As ir. the case of intercorrelaticns between traits rated by the same 
person for administrative ratings, close relationship is found for ratings 
given different traits by the same peAr raters in the few studies avail- 
able. 



In 1922 Xnight (178) in a study of 153 elementary and high school 
teachers found that mutual Judgments o' teachers with respect to general 
teaching ability correlated with their Judgments of intellectual ability 



Reliability of Peer Rating of Instructors 
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Correlation of Peer Eating vita Other Measures of Instructor Effectiveness 
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Highland & Berkshire 635 AF technical Average rankings of peers vs. 

(1951) school rankings 



.94 and with judgment of skill in discipline .79, while judgments of skill 
in discipline correlated with intellectual ability ,86. He concluded that 
in judging particular traits, ’’general estimate” (i.e., halo) influences 
the ratings to such a degree that judgments of particular traits are in 
themselves of little practical use, 

Odenweller (247) also noted that in his study correlation*, are marked- 
ly. higher when both traits are judged by the same set of judges than when 
one is judged by one set of judges and the other by another. 



Studen t Rating of In s tructor Effectiveness 

In recent years certain educators have been quite voluble in advo- 
cating the use of student rating in evaluating the effectiveness of in- 
structors. It is maintained that such ratings tend to raise standards 
of instruction by providing a basis for weeding out incompetent instruc- 
tor^' and for improving the effectiveness of good instructors. These rat- 
ings, it is said, provide administrators with a means for securing depen- 
dable information which they should possess as to the opinions of students 
with respect to every member of the teaching staff, 

That student ratings, within the limits of their reliability, are 
valid measures of student opinion of instructors cannot be questioned. 

It is probably true also that students being in a more or less close re- 
lationship with their instructors are in a better position than anyone 
else to make certain judgments of them. Whether or not these student 
ratings are in turn related to over-all effectiveness of the instructor 
in the teaching situation has not been demonstrated, There may oe a 
closer relationship between pupils* success in school and their reaction 
to the teacher than there is between their success and methods of teach- 
ing or the so-called important physical aspects of the school environment 
and teaching aids. 

While the practice of obtaining student ratings appears to be grow- 
ing, their disadvantages have frequently been pointed out. Some adminis- 
trators oppose them because of the cost in time or money or beca se of 
their possible disruptive effects upon student and staff mox’ale, Among 
instructors there is considerable opposition to student ratings. Cer- 
tain instructors fear the misuse of student opinion as a basis for ad- 
vancement or separation of personnel. They point out also that student 
ratings may make instructors emotional, self-conscious, or resentful and 
that attempts to cater to student opinion may produce changes in unde- 
sirable directions, Students may lose respect for their instructors by 
being encouraged to set themselves up as judges of instructor com- 
petence. Instructors contend that student ratings are unreliable because 
of immaturity and prejudices of the raters who are influenced by grades, 
interest in specific subject matter, reputation of particular instructors, 
difficulty or ease of course material, and the like, tLny students also 
are unfavorably disposed to rating their instructors. They consider such 



ratings a waste of their time unless administrative action results. 

Students themselves point out that the preferred instructor is often 
young, genial, and entertaining, while the serious, more experienced in- 
dividual who stresses subject matter and insists upon certain standards 
of deportment and effort is rarely popular. 

Quite a number of investigators have reported studies of student 
rating as a measure of instructor effectiveness and also as a means of 
instructor improvement. Among these are the studies of Bryan (61, 62, 

63, 64 , 65, 66), Starrak (322), Riley et al. (276), Goodhartz ( 13 1 ) , 
and Remmers and his associates (264, 265, 266, 267, 268, 269, 321, 348)« 

Galt and Grier (126) in a report of an investigation of flying instructors 
state that they found student rating useful and suggest that ouch ratings 
might well be looked into further. In a very recent study Flesher (115) 
has suggested that the question of whether or not ratings of an instructor 
might be inferred from their students’ rating of the course taught by the 
instrpctqr might well bear investigation. Flesher contends ‘that student 
rating of courses tends to be more objective and frank and hence, more 
valid than their ratings of instructors. In a limited test of this hypothe- 
sis done as a by-product of another study, Flesher obtained correlations 
ranging from ,60 to .82 between course ratings and instructor ratings, 
with mean ratings for courses tending to be lower and more variable. 



Reliability of Student Rating of Instructors 

It might be expected that higher reliability coefficients would be 
obtained for composite student ratings than for composite administrator 
ratings of instructors because of the usually much larger numbers of 
student raters as compared with administrators making the ratings. As 
shown by the investigations summarized in Table 6, however, there is 
considerable variation in the reliability of student rating. 

It will be noted that two kinds of correlational studies have been 
included in Table 6. In most of the studies the correlation coefficients 
are based on the number of inst’ructors. This obviously, is* the proper N 
where reliability of students ratings in differentiating instructor 
effectiveness is required. In four stjudies, Remmers and Brandenburg, (.267), 
Root (281), Smeltzer and Harter (315), 'and Amatora (4), the reliability 
coefficients show the consistency with which the same students r*ate a 
particular instructor, using either the same or different 'rating devices. 
These studies give no information' as to the reliability of student ratings 
with reference to the instructor differentiation problem since the N used 
is the number of student raters and not the instructors rated. 

In addition to the studies reported in Table 6, a number of investi- 
gators have reported findings which have a bearing on the reliability of 
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student rating but in which correlation coefficients aro not reported. In 1926 
Fritz (123) found. that 89 students varied widely in their ability to duplicate 
their judgments on two ratings of one teacher obtained on a seven-part scale 
a week apart. In 1942 Porter (257) found, in having pupils- rate some 27 stu- 
dent teachers, that some classes were considerably more lenient than othefe,. 
Porter gave no statistical basis for his finding nor did he consider that 
the difference might be due to teacher merit rat^erC than leniency of pupils, 
if a teacher taught a better lesson in one class, than in another. He con- 
cluded alee that pupils tended to agrte 'closely in judgments of best and 
poorest teachers but varied widely in their judgment of the middle group, 
a finding usually associated with the use of rating scales. 

In 1929 Remmers (264), using the Purdue Rating Scale for instructors 
and in 1934 Starrak (322), analyzing ratings ‘by students of the entire fac- 
ulty of Iowa Stlte University., reported that reliabilities obtained compared 
favorably with those of the best standardized objective tests. In 1932 
Flinn (116) fo\nd that when an instructor was rated by four different super- 
visors and four different groups of pupils during a ten-year period the 
pupil ratings were much more uniform ‘than were the ratings of supervisors. 
Flinn f s result may 'simply reflect the fact that the standard errofr'of an 
arithmetic mean is a function of the number of cases on which it is based 
and that a mean based on four 'different supervisors could fluctuate more 
widely than one based on a presumably 1 larger group of pupild. In 1941 
Albert (l) obtained consistent results when 78 high school teachers were 
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rated by their 1578 pupils. In 1946 Rammers et al, (268) asked 559 engineers 
to uSe the Purdue Rating Scale in rating the Eest and worst instructors each 
had In college. The mean differenoes between best and worst instructors, 
as rated by the total group on the '10 traits of the scale and based on a 
total possible score of 100, ranged from 17.5 for personal appearance to 
59*4 f«> Stimulating intellectual curiosity. The average difference be- 
tween means for the 10 characteristics was 39 • 6. These results are not 
too meaningful in the absence of standard deviations of the ratings of 
best and worst teachers. 



Correlation of Student Rating with Other Measures of Instructor Effective - 
ness 



A number of invu-jtigators. have compared the results of « student rating 
of in.structoprf with tfap'se obtained from administrative and fellow-teacher 
ratings. Some have i sported, tfce obtained .conrelatidns as ’’validity co- 
efficient s.” In a fdiv' instances/ e.g., Iiifts (203) and Remmers eb al. (269), 
pupil gain has been used as the criterion with .which comparisons were made. 

Table ? summarizes 21 studies., in 12 of ynich correlation coefficients 
were reported. The considerable differences '"in hagnitude of the coefficients 
obtained may be partt-Ty explained in terras of theMiverse criteria employed, 
and in part they 'may be a .O' notion of the small numbers of teachers involved 
in mqst of thfl investigations. In general, the coefficients reported are 
quite high Where ratings of teaching efficiency were used for both groups 
of. judges* When a numbe^of tracts were* rated, -however, quite a wide range 
in 'coefficients resulted. This, may .have ( beqn due to the. differing inter- 
pretation placed on the meaitifg 'of the, traa^j by different raters. Re-, 
suits are n6t alwdys' comparable U'rom study to study because of the lack of 
statistical controls. * It was not always possible to tell frcta the reports, 
for .example, when pupils ranked their teachers tf.f corrections were made for 
oiz'e 1 of §roupp. Knight (l?8) applieo" adch correction, 'ap did Boardman (39*) 
who changed his ranks to stigma positions. Botfr got quite high correlations. 
Greene’s study (135) which showed? a high relationship between the teacher’s 
salary .and ranking by pupil§ may mean only that pupils were influenced by 
academic position. 

Davenport (92) obtained a low correlation between teachers self- 
ratings and pupils ratings jf teaching. on comparable scales. He found 
a zero relationship between pupils’ ranking ©f, their teachers and the 
teacher’s self-^atiYig., Davenport, suggests 1 that a teacher’s actual teach- 
ing-may well be- different fro* har philospphV o.f teaching^ simply because 
such factors as size of clqes or other classroom factors force her to 
compromise. 

It is interesting to nojte th^t irj the two .studies where pupil gain wa3 
one of the measures, only slight relationship w^s found.. In the Lins’ 
study (203) the low corrdlatfon-might be due to the smalt number of teachers 
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used in this part of the study or to soms selective factor in the manner 
of choosing which students would rate each teacher. The trait3 otj which 
differences were significant at the .01 level of Remmers’ study (269) 
were: rating as compared to other instructors in the university and care 
of communal apparatus. Those significant at the .02 level were: super- 

vision during tests and dailies, knowledge of chemistry, returning tests 
8nd dailies, should instructor be kept if suitable replacements are 
available. 



Intercorrelations of Student Rating of Instructors 

Ten studies in which intercorrelations were obtained between ratings 
by students for more than one trait ar* presented in Table 8. A. divergence 



f.Ui < 

Intsrcorrslations by Trait of Student Rating of Instructors 



Invsstlaator 


Teecher turpi* 


Humber students 
bar Uecher 


.... Ttpo of retina 


CorreUVlon 


Renner* & Brandenburg (192?) 


2 college 


3 2 


Purdue ecsl* (10 traits) 


-.02 to .62 

.25 (aye rag* for 
aU 10 trait*),* 


Stalnaksr fc Rencer* (1928) 


1 college 


94 




-.07 to .72 

*37 (evert** for 
all traits; 


Renners (1929) 


115 college 


(Hot reported) 


Purdue seals 


.43 (average for 
all traits) 


Eoardman (1920) 


8? high school 


(Not reported) 


Teaching efficiency ve.i 
Work hardest for\ 

Liles b$at 
Discipline 
Learn most 


.73 

.82 

.75 

.89 


Bowman (1934) 


21 student 
30 student 


8-40 

(Not reported) 


Seven treit* 

Purdue scale (10 trsite) 


•12 to .79 
.<9 to .90 


Bemraer* (1934) 


64 student & 76 
college 


10 


Presentation of subject matter vs* 
lnterset in subject 
Stimulating Intellectual curiosity 
ve. interest In subject 
Presentation of subject matter vs* 
stimulating intellectual curiosity 


-.005 & .18 
•02 fc .12 
•l»fc .19 


Starrak (1934) 


(Humber unre- 
ported) entire 
college faculty 


(Not reported) 


Craphlo (17 items) 


-.06 to .63 
,47- (average 
« for all traits) 


1 Arisen* 

(193 ' ) 


46 college 


17-121 


Purdue scale 

Personal appearance vs. sysi pa- 
thetic attitude (lowest r) 

St Lrul sting lntell. curiosity vs. 
presentation of subject matter 
(highest f) 


.06 to .87 

(31 of the 45 
£*■ above .60) 


Soalirled 6 Reamers (1943) 


40 etudent 


20-35 


Purdue seels 


.29 to .88 
(28 of 45 r 1 a 
above .60) 


Henrik son 11949) 


(900 rstinge) 


(150 toUl) 


Effectiveness 

General merit personality 

Voice merit r*. voice 


.57* 


Am tor* ■ 1 


Hone- -items on 
seel* rated by 


(Hot reported) 


General bating ve. group* of item* 


.06 to .33 
.51 to .66 



students 
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of results was evident in the various studies as to how much halo effect 
was present, even in cases where investigators used the same scale, 
Remmers and his associates (264, 266, 267, 32l) in their several studies 
on the Furdue Rating Scale for Instructors shovr very little halo effect. 
As can be seen from Table 8 they reported consistently low correlations. 
In one study (321) only seven of the 45 intercorrelations proved to be 
above ,50, 

In the report of his study made in 1934, Remmers (266) says that his 
results emphasize the relative independence of the traits: interest in 

subject, Trait l; presentation of subject matter, Trait 5 5 stimulating' 
intellectual curiosity, t fruit 10. In thip study } Remmers, in addition to 
the correlations reporte e d t in Table’ 8,' determined halo effect by taking 
"five samplings of intercorrelations of five randomly selected pupils 
against five other pupils for Trait 1 versus Trait 5 '‘nd Trait 1 versus 
Trait 10," (Correlations were not computed between T. .its 5 and 10 for 
some reason,) Those were the 3 of the 10 traits appearing on the Fur- 
due scale that were indicated by students’ as being the most important, 
Remmers averaged the r’s without ( conversion to Fisher z’s and without 
regard to the varying numbers of teachers involved in each r and then 
"corrected for attenuation," The resulting "true" correlation of ,34, 
it seems to the reviewers, may be regarded with more than a little sus- 
picion. In the case of college students, Renners reported average r’s 
corrected for attenuation of .52, .38, and .49 for Traits 1 vs. 5, 1 vs. 
10, and 5 vs, 10, ‘ respectively. 

In 1936 Heilman and Armentrout (148) also using the Purdue scale, 
found considerable halo effect and Smalzried and Remmers (314) in their 
factor analysis study of the Purdue scale, made in 1943, report that 28 
of the 45 intercorrelations were above .60. Other investigators using 
different scales mention that quite a bit of halo effect was found. 

Bowman (47), in fact, in a third o^ a series of studies on student rat- 
ing used an over-all rating because of the high intercorrelations among 
traits found in his first two studies. 



Influence of Grades Received by Students on Their Rating of Instructors 

The meaning of students* ratings of instructors is dependent to -some 
extent on whether or not such ratings are related to grades received by 
students from the instructor conceded. If grades . received are, related 
to students* ratings^ presumably instructors who gave )iigh grades would 
be expected to receive higher rating? from their stiidents than those who 
gave low grades. The presence or absence of the relationships here con- 
sidered thus bears significantly on the validity assigned to students’ 
ratings of their instructors. 

The array of correlation coefficients presented in Table 9 is some- 
what bewildering, particularly in the presence therein of coefficients 
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Table 9 

Correlation of Oredei Received by Student! vlth Their Rating of Trelr Inatruefcon 



If umber students 



X&reetliator 


_ Teaohsr uul« 


_ wr tUflhiL 


_ Ttm of retina 


Aca4»mi« mi sure 


Correlation 


Reman (1930) 


7 etudent 4 4 oollege 


16-32 


PurUue scale (Individual 


fctudqote divided 


-166 to .29 






item#} 


Into two groups on 

tills of grades 


(blierlel) 
-.71 to .45 


Bfrrtk (1930 


Intire feoulty of 


(Rot reported) 


Qrephlo scale ( 17 iteai) 


0 radii 


.15 




eM college 










Bowman (1934) 


9 student 


6-40 


12 character! all ci 


Grade • 

Biffs reace between 


*004 to 165 
-«<9 to .36 










grad# 4 student 

eve rage grade 




StiUiir 4 Kartar (1934) 


5 college 


(lot reportel) 


Orephio.eoele (45 items) 
Ancnyttcde 


Final examination 


-.20 to .16 








Signed 


final examination 


-.14 to .17 


Kroue (1935) 


(Rot reported) 


(Rot r#rort«^) 


Analysis of ■beet* 4 


Grade 


Xo significant 




"worst* t eschar 




correlation 


Hillman 4 Armintrout 

dm) 


46 college 


17-121 


Purdue 


-feeeherU severity 
of grading* 


-.04 






"Reims 11 in grading" 




-.24 


Inna (1937) 


22 er. high school 


20-152 


Oeneral teaehing ability 


Grades 


.07 


41 Jr* high school 








.15 



4 Obtained by computing ths Man of all the grades mi|Md tgr #ach tsschsr for three quarter#. 

of substantial magnitude, but in both positive and negative directions* How- 
ever, a hypothesis advanced by Remmers et al « (269) in 1949 niakes such re- 
sults plausible. These authors explain the apparently contradictory results 
obtained between this study and one by Remmers ( 265 ) at an earlier date in 
terms of methodology. In the earlier study the instructor was kept constant 
while students were varied in terms of grades and presumably scholastic 
ability. In the 1949 study the instructors were varied on the basis of 
whether or not their classes fell short or exceeded their predicted grade — 
presumably a measure of instructor ability. They point out that grades 
obtained under a single instructor and due to student differences may bo 
either positively or negatively related to student ratings but that grades 
reflecting instructor differences rather than student differences are posi- 
tively related to the ratings given instructors. 

If one assumes that good students will approve of instructors who con- 
duct their teaching at a high level (and over the heads of the poorer stu- 
dents), then, a positive correlation between student ratings and grades 
would result. Conversely, if the instructor pitches his teaching at the 
level of the weaker students, the brighter students will disapprove and a 
negative correlation will result. This hypothesis would account both for 
the rahge of coefficients obtained and for the fact that when correla- 
tions are not computed separately for each instructor, coefficients of 
negligible magnitude are found. 

In those studies where grades were assigned "subjectively," i.e., 
where the instructor was directly responsible for the grade a student 
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received, the relationship between grade and rating may reflect the students 1 
response to the instructors affective attitude. The relationship between 
student ratings and objective grades, on the other .hand, may provide an 
indicatioji of the students’ reaction toward teaching competence. Another 
distinction among studies in this area is whether the correlation is between 
mean grades and ratings (where classes are the unit) as in the study of 
Heilman and Armentrout (148) or between individual ratings and grades 
(where the student is the unit) as in the report of Smeltzer and Harter (315). 



fluence of Teacher Factors on Student Rating of Instructor Effectiveness 

In addition to the grades a student receives a number of other factors 
have been investigated as having a possible influence on student rating of 
teachers. Among factors considered h4ve been age and sex of teacher, length 
of students’ acquaintance with teacher, length of time teacher had taught 
in the school or had taught pupil, pleasurable personal relationship between 
student and teacher, and whether or not subject taught by rated teacher was 
students’ favorite subject. In view of the fact that research involving 
these factors has been rather sporadic and that some contradictory results 
have been reported generalizations cannot well be made. The few available 
studies are briefly summarized in Table 10. 

Brookover (54, 55) in his two studies found what are apparently some- 
what contradictory results. This ndght be explained by the fact that the 
measuring devices used by 3rookover differed for the two studies. Brook- 
over concluded that the nature of the pupils’ personal relationships with 
their teachers affects their ratings of the teachers’ abilities. This 



Table 10 

Relationship of Trache*** factors to Student Rat Inf 



Icrtfilxator 


Teacher aaaole 


V\aet*r students 
wir teacher 


student ratify* 


Teacher factor 


Relation shlo 


Vrout ( 1 « 5 > 


(lot reported) 


(let reported) 


false t best A poors at 
teacher 


Taught Student's 
favorite subject 


Close relationship 
between favorite 
subject A subject 
taught by best 
teacher 


Hallman A Arrant rout (1936) 


66 collate 


17-121 


Purdue scale 


Experience, aft, A acx. 


Vo reliable differ- 
ences. 


I roc kora r ( 1 H 0 ) 


17 high school 


12-57 


Purdue A ptraon-to-pereon 


Age A a ax 


Vo relationship 


Davenport (19U) 


$1 high school 


M.a 


GrepMo scale ( 2 $ Itaae) 
^tow Teach* re Teach" 


lusher e«*aters 

rtodeat had been 
taught b 7 teacher 


Vo significant re- 
lationship 


Vrooirorer 0963) 


66 high school sela 


(lot reported) 


General Mr It 
Pupil fain 


*«• 

Length of acquaintance 
with pupil 

Length ef tlM teacher 
had taught In school 
tola In ovunlt; 
Pleasurable perao»-eJ 
relatio- * .Ip 


Positive relation ehlp 
Positive relationship 

Ftslllve relationship 

Vo reletiocahip 
Low, but significant 

nags tire 
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conclusion may be based on a form of halo effect, or more, generally, a 
persistent response set on the part of the pupils. It is interesting tc 
note that in the 3rookover 1940 study, ratings of 39 teachers by their 
students on a scale measuring pleasant personal relationship yielded a 
correlation coefficient of .64 when correlated with superintendents' 
ratings. Boardmah (40), in a study reported in 1930 in which pupils' 
rankings of teaching efficiency were correlated with their rankings of 
teachers in terms of for whom they worked hardest, the teacher liked 
most, the teacher having the best order or discipline, and the teacher 
from whom they learned most, found that when other factors were held 
constant pupi.ls* liking f«.i the teacher Vas the largest single factor in 
determining judgment of teacher efficiency. 

In a longitudinal study of student ratings in which there was some 
turnover from year to year, Starrak (322), in 1934, found that rating 
scores of teachers tended to increase with successive ratings. This change 
was gradual, teachers qriginally ?3.aced in the lowest quarter moving to .the 
second or third quarter by tha end of a f two-year peribd. « Whether thie, im- 
provement was due to some general biasing factor (such as teachers' repu- 
tations among students) or due to increased effectiveness of the teachers 
because of added experience is not clear. 



Influence of Student Factors on Student Ratings of Instructor Effectivenes s 

As in the case of teacher factors, the studies concerned with student 
factors other than grades have been sporadic and not too clearly defined. 
Often they are just a by-product of studies concerned with other aspects 
of student ratings. Available studies have been summarized in Table 11. 
Information on four factors was considered: size of class, sex of stu- 

dents, age or 1 maturity of students, and intelligence or mental a^,e of. 
students. By and large the results of the various studies* show that these 
factors have little bearing on student rating. The cu.rVilihear results 
found by Starrak in ‘regard to influence of size- of class -.re off some 'in- 
terest. It is unfortunate that Heilman and Armentrout did not test for 
curvilinearity &a the size of the dashes in their study ranged from Yb to 
121. Starrak concluded: "On the basis of the r&tingaj 2Q students seem 

to be the optimum number for a college olaes." .Altlujugh his study was 
extensive (ratings were made quarterly on all instructors of the celiege 
and cover several years with a total of 40,000 ratings), it is difficult 
to see how the optimum size of a olaaa could be selected merely ’on the 
basis of student ratings* 

In the case of the influence of the sex of the pupils it might well 
be expected that girls and boya would differ in their ratings of teachers 
of certain subjeot matter. It is possible that a wor<..n teacher better 
understands the emotions and thinking of girl atudenta while a man teaoher 
might deal better with boys and that these differences might vaxy for dif- 
ferent student age groups. To a limited extent the few .studies 'on this. 
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variable appear to support these generalizations though the most out- 
standing result is the lack of differences between ratings by the two 
groups . 

Investigations in which maturity or age of students was one of the 
variables studied appear to be unanimous in the conclusion that this 
factor influenced ratings very little • It should be pointed out though 
that in almost every case a t very limited range in age of students was 
studied. Usually an' Investigation covered the range within a particular 
college or high school or was cohcerned with first year students as com- 
pared with advanced students regardless of age. The study by Drucker and 
Reamers (105) is an exception in tha£ it dealt with the relationship be- 
tween ratings byistudents end ratings by alumni of at least ten years 1 
standing. This study itf particularly relevant to the frequently raised 
objection to student ratings that students are too immature to rate their 
instructors and that many years later, as alumni, students will have 1 
different values and will f) evaluate their former instructors on a' differ- 
ent and presumably better basis. Pdsicive relationship of some magnitude 
w 8 s found.- What differences did ocour showed that the students ranked their 
instructors higher than did the alumni. The^difference was significant for 
three traits. It is possible that this might reflect a change in the teach- 
ers, i.e., that they became more effective, rather than a change in opinion 
of students as *hey get 'older. There was high agreement between the etu- 
dentA and alumni as to the relative importance of the ton traits on the 
scale. The Pearaon product -moment correlation coefficient between median 
rankings of these ten traits by the 251 students and 138 alumni was . 92 . 



Using Student Rating for Instructor Improvement 

There appears to be considerable opinion that, properly used, stu- 
dent rating has valves in bringing about instructor improvement. For ex- 
ample, Sohutte (296), Clem (77), Flinn Oll 6 ) , Riley et a^. ( 276 ), and 
Stult and Ebel (327), after having students rate instructors on one form 
or another, state (generally without adequate research evidence) that stu- 
dent rating enables instructors to evaluate their courses and teaching 
performances and that students 1 opinions often provide a better baais for 
self-study and instructor self-improvement than do the opinions of super- 
visors. 

At the , end of both the first end second semesters Bryan (62^ in 1938, 
asVed pupils to rate 29 Junior high school teachers. He uaed a 9 - item, 5- 
point scale, defined in descriptive phrases. Improvement revealed by the 
retinga was reported in terms of the percentage of items showing a differ- 
ence between the first and second ratings. In this and subsequent articles 
( 63 , 6 *t, 65 ^ 66 )* he indicated that stoat teachers find the student, ratings 
helpful or, at least, not harmful. This expressed attitude of the teach- 
ers, however, may reflect a positive bias, in that participation of the 
teachers in the study was voluntary; thus, the population studied nay have 
been one that already believed in the helpfulness of students' ratings. 
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In 1941 Ward et al. (348) , using the Purdue Hating Scale for Instructors, 
asked students to rate 40 practice teachers at the end of one month of in- 
struction and again at the end of the semester. The ratings were used in 
diagnosing the weaknesses of the practice teachers and as stimuli for im- 
provement. On the retest 39 of the 40 teachers showed a gain in rating. Ap- 
parently no use was made of a control group of practice teachers who did not 
get information concerning themselves from student ratings against which 
changes in the experimental group could have been compared. 

Porter (257), who based hie opinion on a consideration of pupil ratings 
of 27 student teachers obtained in 1942, suggested that supervisors 1 ratings 
may be made more objective by making use of pupil ratings. Presumably 
Porter intended that supervisors should utilize pupil evaluation of practice 
teaching to support their own evaluation of practice teachers. Whether or 
not supervisory estimates thereby become r/.ore objective hae not been estab- 
lished. 



Self-Rating of Instructor Effectiveness 

Few studies of self-appraisal by teachers have been reported in the 
literature. Surveys of rating practices in the schools also show that 
self-ratings are sparingly used. 

In 1927 Roberts and Draper (279) reported results of a study of prin- 
cipals* reports obtained from 441 high schools with enrollments ranging from 
5 to 4000 pupils in all sections of the United States. Of the 398 reporting 
on the use of self-ratings, principals indicated that in 86 schools teach- 
ers were required to rate themselves, in 3 schools it was suggested that 
they do so, and in 309 schools no such rating was required. 

In 1945 Reavie and Coopir (262) surveyed 123 cities in 34 etatss and 
the Distriot of Columbia. Only one of these required a report of self- 
appraisal filed for administrative evaluation. 

Table 12 eunmarisee seven investigations. In six of these investiga- 
tions, correlations were determined between self-ratings and certain other 
measures of effectiveness. Administrative ratings, pupil ratings, or pupil 
gain show negligible relationships with teachers' self-ratings. Seven of 
the 10 coefficients for different schools reported by Cooke (81) were .21 
or less. Even the largest, an £ of .94, is not significant, having been 
obtained with an 1$ of only 25 teachers. The only coefficients signifi- 
cantly different from zero (at the ,01 level) are those obtained by Flory 
(11?) between self- ratings end ratings by friends. Unfortunately Flory 
did not report the difference between means of self-ratings and ratings 
of friends; hence, he provided no information pertinent to the question 
as to the tendency to overrate oneself, The close agreement between self- 
rating and principal's rating in the study by fichandler (111) might be 
explained in part by the teacher's familiarity with the principal's forter 
rating. 
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composite supervisor rating 



The tendency for individuals to overrate themselves is exemplified 
in the study of Knight and Franzen (l?9) who, in 1922 , asked 110 students 
to rate themselves in terms of order of interests and also to rate "ideal" 
ar.d "typical" Junior students* The correlation coefficients obtained be- 
tween self-rating and ratii)g for the ideal was ,/*6 and between order of 
interests for ideal and typical students Was -»64» The authors conclude 
that the data show a well-marked tendency for a person to overrate him- 
self when he compares himself with others and that the tendency still per- 
sists when the judgment is independent of comparison with others* 

In only rare instances are an individuals own estimates of his com- 
petence accepted 1 at full value by his superiors* The educational field 
appears «to,be no exception in this respect* On the basis of the few avail- 
able studies of self-ratings of instructors as well ao from self-ratings in 
general, Ihe obvious, undisguised 3elf-ratlng scale technique would seem to 
offer, little encouragement for funner investigation. It is possible, how- 
ever, that thdre may be some justification for further exploratory work 
with more subtle, selfrrating instruments* 



ObJeotivo Observation of Instructor Performance 

The emphasis of present day teacher-training institutions appears to 
be leda upon selection of a particular kind of pdrson than upon trying to 
teach methods of performance that will insure success in the classroom* 

The establishment of departments of instructor training at various Air 
Force bases attests to the adherence to this approach in the Air Porce. 
Potential instru< tors are given training in methodology and provided with 
th<^ opportunity ' o practice the apprcv^d techniques under simulated class- 
room conditions* In keeping with this emphasis upon instructor performance, 
it might be expected *‘at an instructor's effectiveness might be evaluated 
by observing what the Instructor actually does in the classroom, provided, 
that" the observed behaviors are validated against other criteria* 

Investigaticr.s using observational methods to determine differences in 
pt ’orroanc^ of effective and ineffective teachers. have been fpw in number 
ant have v .ried widely in design. Brownell (59) points out this lack, 
stating t'.it the use of ths technique of continuous, or a series of spaced, 
ob.iervltivu.i intended to detect ‘changes in some form of behavior has been 
grossly neglected in the research work in this area* 

Unfortirately, also, most of these studies have leaned rather heavily 
upo the i objective judgment ‘of the observers. In many cases the investi- 
gat .* himself, and 8 one times an administrative official, did the observing 
thov «,h there are a rinber of studies in which specially trained independent 
observers have beer, employed* The ob< > vatlonal methods used include 
chiti '. ? variations of the time -sampling technique or check-list records of 
*he >reserice, absence, or duration of particular activities* In a very 
f°w cases photographic, phontgraphid, stenographic .reports, and frequency 
ccunis have also been utilised. Studies in which a rating scale was com- 
pleted hy an individual after observing a classroom situation are not in- 
cluded in this section* 
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Re liability of Objective Observation 



In only a few of the studies using the observational approach was the 
question of the reliability of the method considered. Too often it is 
thought enough to say that the observer has had practice in observing, or 
reliability was assumed on the basis of the fact that the observer was 
supposedly an "expert 11 in the educational field. These assumptions are 
made particularly, of course, in cases where the investigator or an ad- 
ministrator was the observer* 

tfhere reliability was computed, the criterion most generally used 
was agreement of independent observers determined by use of a correlation 
coefficient or percentage of agreement on the basis of an item-by-item 
comparison of records. In a few cases occasion-to-occasion reliability 
was computed for the same observer. In Table 13 reliability coefficients 
are listed* In general, it may be said that the reliability of planned 
observational recording compares favorably with that of other methods. 
Anderson and Brewer (7) found that a total of from 300 to 400 minutes of 
obeervation yielded e high degree of consistency in the sampling cf teach- 
ers ' behavior and that observers v:ere more reliable in recording "domina- 
tion" than "integration." 



Validity of Obje ctive O bser vation 

The most general criterion of validity of observation has been face 
validity* In a few studies, however, different methods of evaluating 
the same lessons were compared. In ly30 IScAfee (208), who evaluated 
teacher efficiency by counting the number of gocd teaching practices and 
the number of poor practices as recorded by one observer on a -tailed 
rating sheet, obtained a cor rel at ion coefficient of .4) between this 
evaluation and supervisory ratings for a group of 98 teachers. Shannon 
(304), in 1936, compared three methods for measuring efficiency in teach- 
ing, One of these was based on an attention score obtained by dividing 
total minutes of observed pupil attention (determined by pupil's postural 
attitudes and movements) by total possible minutes of pupil attention. The 
other two, which were sul\V'ctive, although accomplished by the same individ- 
uals as the attention score, consisted of five-point ratings made on a 
scor< card containing 43 rubrics grouped under five headings, and ranking 
of the teaching performance of each teacher within his group. The observer- 
raters were 14 graduate students who had had experience in supervision, and 
the teachers studied were 111 student teachers divided into eight hoao- 
geneoua groups* Correlations between score-card ratings end attention 
scores ranged from. 07 to *61 and betwien rankings and attention scores from 
-.16 to .73 while the correlations between score-card ratings end ranking 
ranged from *38 to *97* ft appears that while pupils' attention scores 
are more reliable (see Table 13) than the score-card ratings or ranking they 
do not compare as closely with the ratings or ranking as the }*«,ter two 
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compare with each other. Since the rankings determined by the two sub- 
jective measures bear higher correlations than comparisons involving 
attention scores, Shannon concludes (gratuitously, it appears to the re- 
viewers) that l, the more subjective means are the better ones of the three 
included in this investigation*" 

In a later paper in 1942 , Shannon (307) made another study of the 
validity of attention scores. Two seventh and eighth grade classes com- 
posed of 47 boys and 53 girls were used. Observations were made by three 
graduate students while material was read to the class. Pupils were 
later given multiple-choice tests covering the material read. Correla- 
tions between attention scores and test scores were: for boys, . 67 ; for 

girls, ,34; for total group, .59* The respective correlations between 
test scores and intelligence were . 37 , . 4 0 , and . 37 , while attention and 
intelligence correlated .14, .34* and .21. The author concluded, "As- 
suming that the material read. • .was new to the children the evidence is 
damaging to the validity of the attention measurement. That it has a 
slight degree of validity is clear, but that it has enough validity to 
warrant its use in judging classroom activity is worse than doubtful." 

It appears to the reviewers that Shannon was unduly pessimistic. Results 
showing an attention measure which is somewhat more closely related to 
student performance than it is to intelligence have implications Justify- 
ing further research. Strictly speaking, Shannon’. > study does not pertain 
to the teaching but rather to pupil factors effecting learning, since the 
teaching was the same for all pupils. 



S ome Significant Observational Studies 

The findings of a number of studies using the observational method 
will be reviewed at some length because the results appear distinctly 
encouraging. 

One of the earlier observation studies was that of Barr (16), in 
1929, who set forth to observe characteristic differences in teaching 
performance of good and poor teachers of the social studies. A group of 
47 superior teachers was selected, on the basis of superintendents 1 and 
state inspectors' ratings, from cities with a population of 4000 and 
over. Similarly, 47 poor teachers were selected from cities of less than 
4000, excluding teachers from one- and two-room rural schools. The 
superior teachers were from the "promoted" group, with better training 
and more experience than the poor teachers. The poor teachers were 
rated C- or below, and $05? did not return to their teaching positions 
the following year. The median experience of the good teachers was 
12.3 years, while that of the poor teachers was 3,7 years. An obvious 
defect of the design of this study was the failure to hold teaching 
situation constant by holding type of school constant. 
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Teaching methods were studied by using a combination of subjective 
and objective devices. These included; (1) an annotated stenographic re- 
port, (2) a time-chart record of one or more recitations, (3) an atten- 
tion chart for one or more recitations, (4) a time-distribution study of 
the major activities of the recitation periods for one week, (5) a check- 
list record of one recitation, (6) a comprehensive questionnaire upon the 
various practices of each teacher, (7) superintendents* estimates of the 
teachers* strengths and weaknesses, (8) the teacher’s 'self -analysis of 
her teaching. 

Barr found the usual subjectively determined qualitative differences 
between good and poor teachers. Strong points of superior social study 
teachers included, for instance, knowledge of subject matter, good tech- 
nique in asking questions, ability to stimulate interest, and socializa- 
tion of class work. Elements of weakness included such items as no pro- 
vision for individual differences, formal textbook Reaching, no interest 
in work, no daily preparation, weak discipline, and no knowledge of sub- 
ject matter. Barr mentioned 52 separate traits in listing the personal 
qualities of good and poor teachers, including personal appearance, sin- 
cerity, energy and vitality, and speaking voice. Barr’s results may bo 
somewhat suspect since his evaluation of the qualitative differences may 
have been unintentionally contaminated by foreknowledge of the identity 
of the good and poor teachers. With respect to quantitative differences 
he found that correlations between time distributions of various aspects 
of class ictivities and Supervisory ratings ranged from -.23 to .17. 'Re- 
lationships between particular items on the time-chart record and estimates 
of teaching success were -»lso found to be small. Barr concludes that it is 
doubtful '’whether time expended in class upon such iten? as those reported 
in this study are reliable indices of teaching ability, * He indicates that 
within very broad limits there appear to be no optimum time expenditures for 
class activities and that good teachers function successfully within a wide 
range of time expenditures. 

Olsen and Wilkinson (248), in 1938, attempted to investigate teacher 
personality as revealed by the amount and kind of verbal direction used in 
behavioral control. They used time-sampling records of responses of 30 
student teachers, 25 women and 5 m^n, to a constant group of children, 13 
first grade, 13 third grade, and 13 fifth grade pupils, in a one-room 
eituation. Each of these grade groups was divided in f o two subgroups or 
classes, equated as nearly as possible for ability, tach teacher was 
observed with each one of the subgroups at *. cast once. Ten five-minute 
samples per teacher were obtained for each class taught. The frequency 
and methods of redirecting children’s attention were observed. Distinc- 
tion was made between language and gestural responses and between positive, 
directive verbal responses as opposed to negative responses. A "'blanket 
score 1 ’ was also obtained by noting each five-minute period in which the 
teacher adjpsted to the clajs as a whole, rather than to an individual in 
controlling behavior when the attention of an individual child needed to be 
redirected. Observations were made by a critic teacher. Teacher efficiency 
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was obtained for each grade, based on independent Judgments of school 
principal and critic teacher together with average ratings obtained on 
Leonird’s Rating Sheet for Predicting Teaching Success. The coefficient 
of correlation between the two raters was .73 for the total score on the 
scale. The correlation between rated teacher efficiency and total teach- 
er response score was -.06, between teacher efficiency and positive teach- 
er resjjonse, .59» and between teacher efficiency and blanket response, 
*.62. when correlations wertf computed between teacher responses of the 
five most able teachefs and pupils* scores on the Haggerty-Olson-Wictanan 
Behavior Rating Scale, Schedult B, (pupils were rdted by principal and 
critic teacher) the resulting coefficient ,was .69* For the five least 
able the qoeffioient was .30. Olson and Wilkinson felt their results 
indicate^ thqt there Was better distribution of • attention in terms of 
ouf)il‘need An the case of the able teachers and .a quantitative analysis 
shewed that the less able teaqhers tended to avoid contact with the 
more difficult cases. Conclusions based on correlations involving two 
N's of five each, however, cannot be taken too seriously. 

.rayne (J66), in 1945,' compared pdpil changes with specific observable 
teacher activities* ,He used 28 teachers of postker*s (282) study, and an 
(Additional '10 teachers and 95 pupils. Pupil gain for the 28 teachers was 
measured' by computirfg roeidual gain (actual gain minus predicted gain) 
for ola^se^ in Social studies on the, basis of eight tests, six'oij which 
were published t'ests and two, composed for thb particulir course of study. 
For the,db fddikldnal teachers, gain was measured after each cIass had 
had. a ilebson on* Alaska, by computing posttest minus pretest and recall 
test minus pretext. In this study no single, specific observable teacher 
act was found. whose frequency or per cent of ocouirence was ihvari&bly 
significantly correlated with pupil gain. "There ie,M Jayne etatee "in 
general, llttlf relationship between specifio observable teacher acta 
and the pupil-jgain criterion." The results, however, varied greatly for 
different methods ofi assessing pupil gain. 

Jayne noted that analysis of the coefficients of correlation seemed 
to indicate that the most significant 'positive correlations with pupil 
gain wore those having to do with extent to which questions were based on 
pupil interest and experience rather than cm assigned text, the extent 
to which *!te teacher challenged pupils to support ideas, and amount of 
spontaneous pupil discussion. A composite index score, called "Index 
of Meaningful Discussion," based on seven items, correlated .60 with 
pupil gain based, on a composite of eight tests and % .39 with pupil gain 
based on two tetfta constructed for ths particular course for the 28 
teachers from the Roatker study j however, this score yielded negative 
coefficients of -.67 for immediate recall and -.68 for delayed recall 
for thft 10 additional teachers* Jayne explains this by the 'fee', that the 
aim of >the lessons. in the first study (Rostker’s) and the ecco:td were 
different. The teaching in the first study was of wider scope, while 
that of the second was aimed toward recall, 'making discussion of textbook 







material essentials Accordingly, Jaym made up a second composite of items 
relating to mere recall of assigned material. This yielded higher coeffi- 
cients for the group of 10 teachers (.82 for immediate recall and .53 for 
delayed recall) than it did for the 28 teachers (.19 for the composite of 
eight tests and -.35 for the course tests). From this it would seem that 
teaching procedures that were appropriate and effective under conditions 
of the first study may have been inappropriate end ineffective under con- 
ditions of the second study, 

Anderson, Brewer, and Reed have made a series of rather exhaustive 
studies of teachers* classroom behavior. Ih the first o'f their studies in 
1945 Anderson and Brewer (6) investigated dominative and socially inte- 
grative behavior of kindergarten teachers, A total of 101 children in two 
schools were observed to determine pupil reaction to the differential be- 
havior of teachers. Among other results, teachers were found to use domin- 
ation of individual children more consistently .than integrative contacts; 
teAchers tended to dominate boys more often than girls; the number of 
teachdr-pupil contact’s per hour had little relation to the numbers of 
children in the room; for a mental hygiene point of view, there* was ’’better’* 
teaching ip the morning than in the afternoon. It thus appears that in- 
dividual children may live in vastly different psychological environ- 
ments in the same schoolroom. 

In a subsequent monograph in 1946 Anderson and Brewer (7) discussed 
results of observations of teachers’ d6minative and integrative contacts 
in second, fouVth, and sixth grades. The categories of teacher behavior 
oheerved £ere largely descriptive and represented activities that made a 
difference in the behavior of the children. Fourteen statistically sig- 
nificant differences between children in the two second grade classrooms 
were found. These were reported to be. consistent with the personality 
differences of the teachers, Fupils or'^he moref integrative teacher 
showd,sigpificantly lower frequencies bf looking up, ( playing with foreign 
objects, in general less conforming and nonconforming behavior, and more 
spontaneity, initiative, and soc lal^ behavior' tljan did those of the domin- 
ative teacher, Teacher contacts in thd sixth; grads situation were as 
frequent as they were 'in the second and fourth grades, 

In a third monograph in 1946 Andebaon, Brewer, and Reed (6) report 
on follow-up studies of the effects of dondnativs and ’integrative con- 
tacts on .children’ s* behavior, fye dominating* teacher was, a year later, 
still dominating, but the children who had passed on into the third 
grade no longer showed the undesirable personality patterns formerly noted. 
Two third grade teaohers were also observed, one 'of whom had twice as many 
frequertoies of domination in conflict contaots with individual children 
and over four times as many such contacts fwith groups of children as the 
other teacher. Within the validity of certain mental hygiene assumptions, 
observations of the teachers' olassroom behavior revealed certain strong 
points and certain weak points. The authors suggest that the weak poims 
are such that they s^e amenable to correction' by Instituting teacher in- 
service trelning programs. As a result of the work by Anderson 
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discussed in the above references, a scale for recording dominative and inte- 
grative behaviors of teachers has been prepared and is to be published in a 
forthcoming issue of the Applied Psychology Monographs . 

In 1952 Ryans (288, 289) reported two studies concerned with factor 
analysis of teacher behaviors, orte of elementary women teachers (275 third 
and fourth grade,' and one of high school teachurs (115 men and 131+ women). 
?hea« investigations are part of the "Teacher Characteristic Study" being con- 
ducted by the American Council on Education and the Grant Foundation. The 
purposes of this broader project as outlined are n (l) to try to determine the 
personality patterns of teachers, (at’ elementary and secondary school levels) 
and 2' to explore the possibility of developing measures that will reflect, 
and predict, such patterns as may be f<wnd." The research is limited to the 
study of ‘/he personal qualities of the teacher on the assumption that cer- 
tain minima of intelligence and knowledge of subject matter (and perhaps 
knowledg* of "techniques" of teaching) are primary requisites for teaching. 

In the ?*rt reported by Ryans, observers trained over a period of five weeks 
recoided observations on a specially devised Classroom Observation Scale. 

This scale covered 26 behavior dimensions relating directly to teacher be- 
havior end pupil behavior (presumably reflecting teacher behavior). Each 
of these dimensions of behavior was described in terms of opposite poles 
and was assassec cn a four-point scale. Each elementary teacher was ob- 
served by at least three different observers on different occasions. Each 
high school teacher was observed by at least two different observere and 
?ometin*s :y three, Data were factor analyzed by the centroid method. 

The factors obtained for the two groups of teachers did not duplicate each 
other entirely although there are points of similarity. Ryans (287) be- 
lieves that three correlated factors may serve satisfactorily to describe 
teacher behavior at both levels’, (l) understanding, friendliness, and re- 
sponsiveness on the part of the teacher; (2) systematic and responsible 
teacher behavior; and (3) the teachers stimulating and original behavior. 

The three factors show somewhat different relationships in the two school 
situations. Factors 1 and 3 are most highly correlated in the elementary 
school situation with Factor 2 being relatively independent. In the 
secondary school situation Factors 2 ar.d 3 are most highly related with 
Factor 1 being relatively independent. 

The work reviewed in the foregoing section constitutes a preliminary 
attack which premises to be one of the most productive in this area. Sys- 
tematic observation should prove fruitful both as a source of rationale 
hypotheses concerning the nature of teacher effectiveness fcnd as a tech- 
nique for testing such hypotheses. The relevant categories for observa- 
tion will of course depend on the particular ‘situation being investigated. 
Thus, in Air Force schools, for instance, the observational technique will 
probably employ categories which differ from the categories of observation 
developed for elementary and secondary school teacher behavior. The dif- 
ferentiation of those behavior categories which are related to instructor 
effectivencsk fraa those which are immaterial remains to be investigated. 



Another approach to the investigation of the effectiveness of instruc- 
tors should also be explored further# It is that in which teacher factors, 
situation, or method are systematically varied as was done, for example, 
in studies (204, 355) of so-called authoritarian-democratio teaching. It 
has i^een suggested that the experimental classroom in whioh factors as- 
sociated with teaching can be manipulated under controlled conditions may 
offer greater potentialities for achieving successful results than do the 
correlational studies of teaching competence in situ# 



Student Change as a Measure of Instructor Effectiveness 

Most educational authorities hold that the primary x'esponsibility of 
the instructor is to bring about change in the knowledge, skills, under- 
standings, attitudes, appreciations, interest, and motivation of his stu- 
dents. For advocates of this point of view the^determination of instruc- 
tor effectiveness is logical and straightforward. It consists of measuring 
the changes that are produced in students as a result of the instructor’s 
efforts. 

The importance of pupil achievement as a measure of tsaching ability 
has long been recognized. As early as 1921 Courtis (85) pointed out the 
significance of student gains as a criterion of teaching efficiency, as well 
as the importance of holding constant extraneous factors. He pointed out 
that a comparison of pupils’ learning curves for incidental learning with 
their curves for direct instruction would provide a means of evaluating 
teacher competence. In a later article (86) he cautioned that any method 
of measuring teaching effectiveness must involve the use of a "single-variable" 
measure. He held that it was neces3ary to measure the change in the rate of 
growth which takes place in the student when the amount of quality of teach- 
ing is the only variable in which change occurs. Oourtis then defined good 
or poor teaching by the periods when the actual' growth curve showed marked 
deviation from the theoretical growth curve. To illustrate the method, an 
observed growth curve of a particular function was compared with a theoreti- 
cal growth curve for the same function as defined by Gompertz’s formula ex- 
pressing the general law of biologic growth. The author maintained that, 
while much research remained to be done, an exact scientific method had 
been devised by which the effects of teaching might be precisely measured. 

Unfortun; tely the possibility of comparing curves of "indidental learn- 
ing" with curves of learning from "direct instruction" seems much further 
away today than it did to Courtis in 1921. While there has been immense 
progress in the science of measurement, this progress has brought a reali- 
zation of the difficulties involved in charting intellectual growth curves, 
particularly in an area as ill-defined a3 "incidental learning." 

The first reported attempt to use student change a measure of in- 
structor effectiveness appears to have been that of Hill (155) in 1921. 
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This and subsequent studies can, for purposes of discussion, be divided 
into five classes, according to the kind of measure of student change 
that was used or suggested : ran gain (posttest minus pretest scores); 

achievement or accomplishment quotient; miscellaneous measures; corrected 
raw gain (raw gain corrected for initial intelligence, grade, or other 
variable); and residual gain (actual gain minus predicted gain). 

Among the invesrigtitors using raw gain as their criterion or as one 
of their criteria are Baird and Bates (14), Barr et al. (20), Betts (33), 
Bimson (34), Bowden (46), Brookover (55), Hartmann (146), and Hill (155) • 
Use of raw gain as a criterion is manifestly inadequate. Teaching is 
only one among many factors operating to produce changes in students. 

It is necessary, consequently, to hold constant all factors other than the 
effects of the particular teaching situation being studied. Since the 
early 1930’ s raw gain has rarely been used or, if U3ed, was one of sev- 
oral gain criteria. 

The accomplishment qvoti.i. or ratio which is the ratio a pupil’s 
educational age or quotient, as measured by standardized achievement 
tests, bears to hi? mental age or quotient, as measured by standardized 
intelligence tests, has been widely used as a so-called objective measure 
of teaching efficiency. This ratio allegedly indicates the extent to 
which a child is “working up to his ability.” Goodenough (129) points 
out, however, that there are several sources of error which are likely 
to reinforce rather than cancel each other both for individual cases and' 
in group data. The errors arise from lack of knowledge as to the absolute 
zero point in the two measures, from unequal variability, and from failure 
to allow for regression due to errors of measurement. As Goodenough (129) 
says “...in spite of repeated demonstrations of the unsound assumption 
upon which the method is based, it has proved to be one of the roost per- 
sistent die-hards in the history of educational psychology.” The accom- 
plishment or achievement quotient has been uoed by Barr et al. (20), Coy 
(88), Crabbs (89), Simmons (310), and Stephens and Lichtenstein (323). 

Certain investigators have attempted to use other student measures as 
criteria of instructor effectiveness. Thus, in 1934 Davis (96) used pupil 
achievement in term? of passing or failing state high school examinations; 
in 1934 Frederick and Hollister (121) used numbers of honor grades and fill- 
ing grades; in 1935 Lancelot (192) utilized persistence in taking advanced 
courses and grades received in those courses; in 1938 Beaumont (26)' em- 
ployed number and achievement of students taking' advanced courses; in 1945 
Cheydleur (75) used ranking of instructors according to the ratio of clas3 
average to group average in college French. While some differences among 
instructors were found, the outcome of none of these studies appealed to 
be very significant. 

The validity of these student measures as criteria of instructor ef- 
fectiveness may well be questioned. Whether or not a given student passes 
or fails a state examination, or achieves honor or failing grades, depends 
upon many factors besides his teacher. The same is true of the ratio a 
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class average bears to a group average. Where pupils from different schools 
are compared, some means must be found for controlling such variables as 
size and type of school, equipment and library facilities, and the like. 

In all cases whe?o groups or classes of pupils are compared, such pupil 
factors as intelligence, motivation, interest, and aptitude of pupil for a 
particular subject must also be controlled. The reliability and validity of 
the examinations on which the student’s grade i3 based must also bo taken 
into consideration. The number of students or their persistence in taking 
advanced courses and the grades achieved in these courses may depend upon 
the enthusiasm of the instructor or the interest he is able to build up in 
his students in the elementary courses or it may be a function of the repu- 
tation or competence of instructors teaching the advanced course?. Any 
simple measure of student gains that fails to take into account the com- 
plexities involved will almost inevitably produce misleading results. 

While not strictly concerned with gains Seyfert and Tyndai (302), in 
1934> used a rather unique approach in attempting- to evaluate differences 
in teaching ability. The subjects were two general science teachers who 
had previously been rated best and poorest of a group of seven teachers 
by superintendent, principal, and supervisors. Four groups of students 
were used: two groups of girls matched for age ar.d score on the Terman 
Intelligence Test and two mixed groups with age and score on the Rulon 
Science Teat held constant. Student achievement was determined in terns 
of the mental age necessary in order that a student of the less able of 
two teachers may achieve the same score level as a corresponding student 
of a better teacher. The difference in teaching ability between the tw r o 
teachers was found to be equivalent to about three months of mental growth 
on the part of the students, 

Lancelot (191) says that mere acquisition tests are not sufficient to 
determine student gains because of the discrepancy between acquisition of 
knowledge on the one hand and its retention on the other. He feels that 
a better and re3atively sound criterion of teaching ability consists in 
the degree of retention by the students of knowledge taught. While theo- 
retically this may be true, use of amount of retention as a criterion poses 
the additional problem of finding some method for holding intervening 
learning constant. 

The first studies to measure student gains by partialling out factors 
other than achievement were those of Moss et al. (235) in 1929, Taylor (331) 
in 1930, and Betts (32) in 1933* Moss et al. in studying the efficiency of 
chemistry instructors used classes equated for intelligence and previous 
training in chemistry, Taylor corrected for intial score, age, and in- 
telligence. Betts, besides using a measure of gain in reading indicated 
by the mean of the final scores on the Stanford Achievement Test, studied 
the relationship of various teacher measures with standard deviation of 
the class and measures of heterogeneity and homogeneity of achievement 
which were obtained by combining pupil mean final score and standard devia- 
tion by formulas. He also computed correlations with these teacher meas- 
ures after partialling out factors of age, initial score, and standard 
deviation. He obtained much higher correlations for his teacher measures 
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(intelligence, professional information, vocabulary) when the criterion of 
"hetero-achievement" was used. He points out the pitfalls of judging gain 
by score alone or by heterogeneity (standard deviation) of the group. The 
latter "can be secured by causing dull pupils to forget some of the things 
they knew initially and by inducing superior pupils to learn. If both 
average achievement and heterogeneity of pupil groups are taken in combina- 
tion, such an influence serves to reduce the composite score because a max- 
imum composite can be obtained only by increasing both concurrently." 

In 1945 Boltoi (41) used the ratio of mean pupil achievement to its 
standard deviation as a measure of teaching effectiveness. In comparing 
six teachers of United States History for matched groups of pupils, he 
reported that one teacher excelled, having a ratio of teaching effective- 
ness more than four times greater than the teacher next in line, while the 
ratios of the other five were close together. In interpreting Bolton’s 
findings one should avoid the fallacy of the tobacco company that adver- 
tises cigarettes which contain "five times less acid tar." The use of 
ratios based on educational or psychological test scores involves assump- 
tions untrue of such scores, namely that their lower limit represents an 
absolute zero point and that intervals between scores are equal. We can 
never say that one person is four times as intelligent, knows twice as 
much history, or is four times more effective as a teacher than some other 
person. Other investigators who have used corrected raw gain included 
Bimson (34), Day (98), and Georges (12?). 

Of the several methods used to measure pupil change, residual pupil 
gain (i.e., the difference between actual gain and predicted gain) is be- 
coming more widely used as a criterion of instructor effectiveness. This 
method is really a more refined example of the corrected raw gain criterion 
already discussed. Its main advantage is that a more adequate attempt is 
made to hold constant student factors other than the effect of the instruc- 
tor. The chief disadvantages are its dependence upon the availability of 
valid instruments for measuring student growth, the excessive time required 
to obtain the necessary data, and the rather elaborate statistical assump- 
tions and analysis involved. With all its difficulties, however, this 
appears to be one of the best criteria of instructor effectiveness. 

Several versions of residual pupil gain where gain was predicted on 
the bases of such student factors as initial scores or intelligence quo- 
tients have been used by Gotham (132), Jayne (166), Jones (172), LaDuke 
(188), Lins (203), Remmers et al. (269), Riesch (275), Rolfe (280), 

Rostker (282), and Von Haden (344)* These studies will be considered on 
subsequent pages. 



Difficulties of the Gains Criterion 

Tyler (338) and others, however, have pointed out the difficulties 
which attend the use of student gains as a criterion. In the first place, 
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as was noted earlier, what is meant by gain mus’t bs adequately defined. An 
instructor is called upon to perform many duties and to accomplish many 
changes in his students that are not measurable in terms of subject-matter 
achievement# Therefore, any measure or measures of student change based on 
gain in subject matter alone represents only a small area of the instructor’s 
total effectiveness. This Ejection probably applies less or may not be 
applicable at all to the Aj.r Forco situation, in which the instructor* s 
chief concern is the teaching of course material of a technical nature. 

Determination of gains attributable solely to the teacher is depend- 
ent on the availability of valid instruments for measuring such growth. 

If more than just 3ub ject-matter learning is to be used, more use of 
achievement test data is manifestly inadequate. 

As a practical solution most studies measure student gain on the 
basis of subject matter learned on the assumption that it is, if not the 
total gain, at least probably representative of the major part of the 
teacher's job. Even assuming that the type of gain that is to be measured 
is known, there art still difficulties in obtaining a valid measure. If 
gains of classes under different schools are compared, use of standardized 
achievement tests may only reflect the differences in the teaching program 
in use in the different schools and not tho ability of the different teach- 
ers. In this connection, tests designed to measure the learning achieved 
in a given course of study are probably more adequate than the more general 
standardized achievement tests. The nature of the subject matter selected 
may also make a difference. A gain in spelling may be a less complex meas- 
ure than a .gain in arithmetic . Judging the effectiveness of a teacher who 
is teaching several subjects, such as is usual in the elementary grades, 
on the basis of the gain of his students in a single subject field is ob- 
viously inadequate. 

As another difficulty, an instructor whose students obtained high 
initial scores might show up poorly under a gains measure even if correc- 
tion were made for the high scores. This is because of the limited gain 
possible in the case of high original scores and the increased improbabil- 
ity of making a given gain as the initial score becomes higher. Every test 
has a ceiling, a maximum or perfect score beyond which no one can gc. If 
a student’s score is near the top on the initial test he cannot gain as 
much as the person whose score falls near the bottom. This difficulty 
can be overcome if regression equations are used to obtain predicted final 
scores and if the tests used have high enough ceilings. Analysis of 
covariancs may also counteract this difficulty. 

The gain of a student with a high initial score for his grade group 
is also limited to' 30me extent by the general teaching situation. In most 
schools for each subject and each grade there is a definite rahge of diffi- 
culty of material to be taught. This in effect imposes a test ceiling for 
that particular grade in terms of the subject concent considered to fall 
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within it 8 range. For this reason a student who has already made inroads 
into the subject-matter content for his grade will appear to be making less 
progress than a student of lower initial achievement. 



Reliability of Stud e nt Gain 

In Table 14 appear reliability coefficients of measures of pupil 
gain as reported by five investigators. It will be noted that, as com- 
pared with conmonly reported test reliabilities, most of the coefficients 
appear to be rather low. Taylor (331) explained the reliability coeffi- 
cient of .26 for .reading progress in terms of the slight numerical changes 
in scores that took place. Rolfe (280) reported a reliability coefficient 
of .82 for the initial composite of three Hill tests and a coefficient of 
*78 for the final Hill composite, yet the reliability of tho change was 
only .19* In general, reliabilities for gain tended to be lower than 
those reported for either initial or final scores. Ro3tker (282) sug- 
gested that this may have been due to the fact that the gain reliability 
coefficients contain errors of measurement derived from both the initial 
and final applications of the tests used. In addition, the reliability 
of a gains measure is dependent not only on the reliability of initial and 
final measures hut also on the correlation between them. The higher the 
correlation of chese variables the lower the reliability of the gains 
measure. 

In general, the statistical computations involved in the estimation of 
the reliability of student gains are equivalent to those involved in esti- 
mating the reliability of differences between test scores. Methods are 
discussed and relevant formulas are given, for example, in Lindquist (202). 

Interpretation of a reliability coefficient rests on the assumption 
that it has been obtained as the result of correlating comparable measures 
of the same thing and that the variable errors are uncorrelated with them- 
selves and with the true spores. If errors are correlated, it follows that 
tho obtained reliability coefficient will be spuriously high. In this 
connection it should be noted that all the correlations reported in Table 14 
are split-half. These coefficients show the uniformity of the effect of the 
instructor within a single class; They do hot give any information as to the 
consistency of instructor effectiveness in different classes. Coefficients 
of reliability obtained by the split-half method will be increased by any 
noninstructor variables that affect a whole class, while class-to-class 
correlations would be decreased by such influences. 

An investigator may be interested in the effect of the instructor upon 
a class as a whole or upon certain types of students ,within a class. Since 
most research in this area' hast been concerned with the effectiveness of the 
instructor with respect to a class, measures used, in determining pupil gain 
have usually consisted of means for groups of students. The reliability of 
average measures of pupil gain based on a group of pupils may differ from 
reliability of gain determined for individual pupils. 
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Reliability of Measures of Student 
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Correlation of Student Gain with Other Measures of Instructor Effectiveness 



Investigations in which attempts have been made to relate measures of 
student gain to other presumed measures of instructor eftectiveness have 
been summarized in fable 15. Reported coefficients range from. -*6l to .81. 
In more than half of these studies one or more negative correlation coeffi- 
cients were obtained. This extreme variability may mean that measures 
used were inadequate or that the gains criterion is dependent on factors 
other than the teacher such as subject matter taught or pupils' academic 
level. On the ether hand, in view of the statistical pitfalls awaiting an 
unwary U3er of the student gains criterion, certain of the studies which 
show low or negative relationships may merely be reflecting inadequate 
research design. 

In five of these studies, Sinmons (310), Bimson (34) , Brookover (55), 

Von Haden (3 44), Remmers eh al, (269), correlation coefficients were not 
computed, were not significant, or were not available to the reviewers. 

Bimson consistently found that pupils of teachers rated above the median 
made higher gains than pupils of lower rated teachers, but that greater 
progress was made bv pupils of lowest intelligence. It should be pointed 
out that Bimson (34) determined a progress quotient by dividing the dif- 
ference between pretest and poshtest scores by l.Q. This procedure appears 
highly questionable since it penalizes the brighter students who tend to 
make high initial scores. Due to test ceiling the possible gains of th&se 
students are less than possible gains of duller students. This in turn 
favors the instructor whose efforts are directed toward the students of 
low l.Q. The ,f little relationship" rsported in Table 15 for Jayne's study 
(166) is based on the fact that Jayne found significant only 20 or about 
six per cent of 336 correlations between frequency scores of observable 1 
instructor activity and pupil gains. In the report reviewed, Brookover 
(55) failed to include statistical analyses which were evidently made in 
the original doctoral dissertation from which the article was drawn. The 
negative association which Brookover found between mean gains in pupils* 
history information and the pleasurable personal-relationship which the 
teacher has with his pupils is what might be expected. The instructor 
who spends his time being a "good fellow" with the students probably to 
some extent neglects to impart subject matter information. 

Ae one examines the results of correlational studies such as sane 
of those summarized in Table 15, one wonders what thinking lay behind the 
investigations. Some of the variables intercorrelated are so unreasonable 
and arbitrary that one suspects they, were computed simply because data on 
certain variables were available or could be readily obtained. In some 
instances, certainly, there exist no psychological nor educational grounds 
on whjeh relationship between student ( gain and some of the variables used 
might reasonably be expected to exist. Computation of such correlations 
were obviously largely a waste of time and their reporting makes no contribu- 
tion to our understanding of the relationship of student gains to rated 
effectiveness of instructors. 




5 ? 



Table 15 

Correlation of 'UiririJ of Student Gain with Other Keaaure; of Teacher Effactlreneat 



Inva at loiter 


--- Teacher uni* * 


Subliet 


Rcer-r* of pupil main 


Ktewe of taacher effectlvineia 


^SartPiaUau 

*a5 

.24 

*19 


Hill (19a) 


13$ elementary 


Arithmetic, pan* ship- 
apelling 


PostUet nlnua prataat 


Administrator rating (Wlnnatka) 
Administrator re' lng (Gary) 
Admixi *t rater rating (Detroit) 


C rathe (<925) 


Ilementary, rural 
Elaoantary, urUn 
Eleewntery, rural 
Elarentery, urban 


Reading 

Raadlng 

Cea polite 5 nhjeote 
(reading, arithmetic, 
apelling, penmanaM p, 
compoaltlon) 


Achievement quoM-tnt 
Achievement quotient 
Achievement quAlint 
Achievement quotient 


Jveriga ranking (> suoerrtcora} 
Pnnking (1 supervisor/ 

Cat luting teeeilng Ir gana:*] 
Eit luting teaching In ganiral 


.27 

-•36 

.33 

-.26 


Baird ft Bite a (1939) 


470 fleaantery 


leading 


Achievement quotient 


Frinolpel rating (general merit) 


.14 


Taylor (1930) 


105 elements ry 


Reedlrg 

Arlthmatio 

Raadlng 


PoeUeat nlnua prataat 

(Initial loom, age ft 
lrUUlgcnee held 
connect ) 


Composite administrator ranking 
ft education specialty rating 
Compoalta vlnlniatrfter i-enklrg 
ft advent Ion apaeUlty rating 


• 24 
.02 
.24 
.10 


Simon* (1933) 


40 e'leunterj 


(hot reported) 


Achle/eunt quotient 


3 mdmlnJ 'treter rating a 


HegXlglble 

relation 


Barr, *Lil* (1935) 


66 aleaieatary 


Arithmetic 

Arithmetic 

Arithmetic 


Poattrsut *lnue prataat 
Arhlevatenv quotient 
Achievement quotient 


Superintend art rating (com- 
posite, 7 acelaa/ 

S ape lint and ant rating (com- 
posite, 7 aeel'a) 
Sup-rlntandant rating a a eh of 
7 acelaa 


.cv 

-.04 

-.13 to .16 


Jonee (19V6) 


13 high aehool 
63 Mgh aehool 


English 

15 high eehonl foVjtetf 


Residual gain 
Residual gala 


Supervisor rating 
Supervisor rrtlr* 


-.61 

-.38 ft .10 


Brftort (1949) 


9 (laDuka) 
17 (Soetker) 


CcMmuilty llri.ig 
Social ntudlea 


Residual gain 
(ortgii.il atudy) 
Raaldvel gain 
(original etivjjr) 


Supaxelaor follow-up (g yr« ) 
Supervisor folljw-up (11 yr.) 


.14 

.35 


*«oara, ffcjl. ( 1949) 


53 chaaiatry 
laboratory (3ft 
•xja»4 prod let Ion \ 

25 War predic- 
tion) 

50 eh «lit ry (2B 
txoaod prediction j 
HO wrier prediction) 


C heal at 17 


Rialdual gain 
Othar tralta 


Student rating 32 traltai 
Ure of coamaaml apparatus 
Rating co pared with Pvrdua 
initruetor toat 
Anovledga of cheui ttiy 
Returning dailiaa ft teata 
Should lnatruetor bo kapt 
8uperrl«ioo during tort a 
Ccraregf of aaalgned wu-k 

Hot algnlflunt 


.01 b 

.01 

.02 

.02 

.08 

.03 

iC3 


Ton Helen (1945) 


17 high aehool 
womb, 1 yr. expert- 

•oca 


6 hlah aehool tnbje is 


Riiiduel gain 


Swenrlacr rating! of peraonal 

data ltama 


Hooe nf 34 £*• 
algnirieaat 


Lina (1946) 


17 high aehool 
women, 1 ft, expert* 

anea 


6 high eehocl eubjeeta 


Residual gain 


Compoalta 5 aupervlaor rating a 
Pupil avaluatlon of teacher 
effeetlTadaaa 


.19 

.06 


BL**oo (1937) 


25 Mg*« oobool 


Algebra, general 
rclance, history 


AchlaraMnt quotient 


Superrlaor rating# 


Higher r;f-d 

teaohari 

show 900- 
tleteotly 
■ore pupil 
profreee 


Irookmr (1945) 


66 high aehool Mia 


0, 8. Rlatory Inf e ma- 
tt on 


Poattaat *iani prataat 


Pupila* pleasant peraoul 
re let loo a 


Low aigalfl- 
eance nega- 
tlrw rela- 
iloortlp 




64 high aehool male 


U. B. 8 later/ 


foattert adnua prataat 


AdalnlrtraUr rating a 


Bo algnlfieant 
relatlooahlp 
Low, IrregxJar 
rrUtiooahJp 




66 high aehool Mia 


l U a. Rlatory 


Poattart nlnua prataat 


Pupil rating of abUlt/ 


Oothu (1945) 


57 elfuanUry, ft 

ft- root 


CUiiecmMp ooursa 


•eilduel gain 


Buperlatoodert, auparrlaor, 
obaener 0 aealae) 


.40 


(1945) 


& tlumX*Tf null 


aoelal rt&fllaa 


Residual gain 


ProquaMT of obe amble 
aetlrltlae 


Ultlo rela- 
tloarkip 
between 



vpeeltle 

•biifntt* 

uli ft p^U 
|&lB 



* RreUeit* of 1- ial t-rooa achoeU 

' Uwl *f MafUiBM of ilffamo* U mob rUU|a bttvwa lutmtan «^.i 41 aiin obUlud grade* U ohemdrtry higher Uu m41«U4 ibJ U«m 
«Umm oH&lMft P«4M lower ttaa preftlrtud. 

* frotl u#<i rm*oaJ ft wuu Cooperative Social fttudiae fart 

Wahbtrft* ftooUl id Hetmeot Xsr*ntov 7 
ll|H Coodurt Tart 

Ifiri* (Ml* far Seafaring Attitude Toward Toaeher 
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Ubl© 1J (Cort.) 



_ knitlxitor 




. . - . . _ 


Hmtum of MI .11 Ilia 


Hjaiur© ef mahrr illnUtfftlli 


uoraliUoa 


ux-uk« ( 19 * 5 ) 


31 «l©Mai«i7i 1-roo* 


Cumni \j living ooutm 


IcUduiI giLa-lafcr- 
■alien tilt 


5jf*rint*nd*at ntlng (3 eot^i) 
Bup©rrl*or tnchir nttag (J 

5up*rlntiad«ct ntlog (3 «ctl««) 
Sup«rvi*©r tueNr n.lng (3 

Kllll) 


.17 

-.03 








lUjlduil gtia 
(Cocprtf tatlon 
ippncUtloa, 
ittitodii, lafe r- 
Mtloo, lattnat) 


.02 

-.25 


(1945) 


21 ntnl 


to«Ul ilviUa 
tociA ttudl*» 


Rulduil g»la 
iMlduil gila 


bivcttlgiUr ritlng (3 «il«») 
Suj*rri*cr ntlag u ie*>i) 


,W, .2$. .34 
-.02, -.01, .15 


totf# (1945) 


47 tlcMntirr* 1* 1 
2- roc* 


Cltlu&ihip o©u/«© 


lUildutl <iia 


3 >>tlag i;ilii 


.36, .??, .43 


KUMh (1949) 


22 «l«MaUr. 


i:bio«a«nt la *©ol*l 
•tudiM 
P«r»©rAlltjr 
light eoadvet 
toe ill idjurt»*nt 
Ittltwl* 
foapoitti ill 5 
aittunt 


KtiMtul |ila° 

Ecilduil gila 0 
In Mull gila® 
In Muil gila® 
R«i Muil |*lx. 
Im Muil (ila° 


top*rlnti4dict r it lag 

Supirlntmde^t r»Ung 
S up# rial tnS «ot rut Log 
5up*rlntmd*ot ntlag 
3u^©rlat*nd*nt r.tlag 
Superintendent rating 


.22 

.20 

.35 

.24 

.01 

.31 



* Ixolaalvt ct l* tad 2>ixoi *ch©ol« 

^ Lml ©f c©nfld*a<© ©f dlffincci la ntlngz b*tv*«a Initiator? vhA** cUijsj Lfct»lr>4 l.i ^.ft'.lati/ hiil*.* tLm prtd>ctii tuf thou 

when tUlHi ©btiiaid gndti lou©r than pr*dlet*d. 

0 Tuti u;j: Tovt.mo<J & Villi* Coapmtln toclil StudUi T*zt 
Vtihburn* too 1*1 Mjurtant lanntorjr 
Wood light Conduct Taat 

Raaaara 1 toala for >W* wring ntltuJ* r«tch*v 

The great .iscrepr.ncieo in the findings of invent 1 . 5 a lor 3 who ha*, o »- 
arnined the student gains criterion emphasize the extreme variability in re- 
lationship among criteria used to indicate instructor ability. Apparently, 
at ieast within the limits cf the measures so far used, the relationship 
between administrative opinion of a teacher v s competence and the amount of 
subject matter that teacher will impart to her students cannot he predicted. 
While there may be no single measure that correlates consistently with 
measures of student change, it appears, as Jayne (.166) has pointed out, 
that a composite index may be found which has high correlation with the 
student gains criterion. 



THE PREDICTORS — TRAITS AND QUALITIES ASSUMED TO BE RELATED 
TO INSTRUCTOR EFFECTIVENESS 

As might be expected many research investigations have been concerned 
with measuring or assaying these abilities, traits, qualities, and person- 
ality characteristics which are assumed to contribute to success in teach- 
ing. Assumptions are usually implicit also that the effect of a trait 
tends to be constant, that potential instructors can be select?d on the 
basis of these traits, and that effective and ineffective instructors can 
be differentiated in terms of patterns of traits. Traits related to 
failure have also been investigated and ai’e summarized in a later section. 

Among the traits and qualities of teachers that have been investigated, 
studies most frequently ha^e been concerned with the following characteris- 
tics: intelligence, scholastic achievement (academic level reached or 

grades obtained), knowledge of subject matter, age and experience, cultural 
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background, teaching ability, teaching aptitude, professional attitude toward 
and interest in teaching, emoticnal stability and social adjustment, and per- 
sonality* Attempts have been made to evaluate and relate a teacher f s per- 
sonality in general to teaching success and also to indicate the relation- 
ship to teaching ability of such allegedly specific personality traits as 
aggressiveness and control, appearance, considerateness, cooperativeness, 
enthusiasm, motivation, objectivity, and reliability. Some factor analysis 
studios have al~o been made (12, 70, 1A2, 1A9, 189* 220, 277, 265, 288, 289, 
295, 31k) in order to determine to what extent various factors contribute 
to teaching effectiveness. 

In the following pages the available quantitative studies relative to 
these traits and qualities will be summarized. In considering the various 
correlation coefficients reported it should be remembered that their moan- 
iivgfulness may be limited by the use of unvalidated criteria such as ratings, 
and their magnitudes may be limited by unreliabilities of the criterion as 
well as of the predi ■ s. pQOR ORIGINAL COPY - iJEc* I 

AVAILABLE AT TIME FILMED 

I ntelligence as Related to Instructor Effectiveness 

It rtould appear at first glance that of the desirable teacher charac- 
teristics one of the most important should be intellectual brightness. 

That there might be a relationship between teaching ability and intelli- 
gence was realised even before the Stanford revision of the Binet-Simon 
Intelligence Test popularized the I.Q. and the Army Alpha provided an 
easily accessible measure. This implicit hypothesis that teaching effec- 
tiveness and intelligence are related is reflected in the correlations 
between ratings of these two teacher variables; such correlations may be 
high because of halo effect, or more accurately, because of the logical 
error of assuming that intelligence and teacher merit are related. 

In 1912 for instance, Boyce (A 8), basing his findings on the rankings 
of 32S secondary school teachers by 27 administrators, reported a correla- 
tion coefficient of .71 between ranking on general merit and ranked esti- 
mate of intellectual capacity. As late as 1929 Baird and Bates ('■ se- 
cured subjective ratings of intelligence of AAA elementary schoo* teachers 
made by their principals with a five-point scale. When general merit 
ratings were correlated with estimates of general intelligence a correla- 
tion coefficient of .58 was obtained. The corresponding coefficient for 
social intelligence was .57* When these coefficients are compared with 
those obtained by using more objective measures of intelligence (see 
Table 17), the presence of the halo effect in these estimates of intel- 
ligence becomes apparent. 



I ntelligence Te s t Scores as Belated to Instructor Effectiveness 

In 55 of the available studies which have appeared in the last 25 
years, attempts have been made to relate objective measures of intelli- 
gence of t>e teacher to various measures or estimates of teaching 
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effectiveness* Intelligence test scores have been correlated with practice 
teaching ratings or grades, various administrative ratings, student ratings, 
and pupil gains* In the studies mentioned, 17 different intelligence exami- 
nations (in some cases two or more) were employed* The American Council 
on Education Psychological Examination was used in 12 studies and the Army 
Alpha in 7 studios. 

In 15 studies (12, 20, 52, 58, 37, 119, 125, 172, 203, 208, 247, 261, 
275, 280, 323) negative correlations Were reported, th largest being those 
of Riesch (275) r *» -.34, Jones (172) r * -.26 and Stephens ana Lichtenstein 
(323) r « -.24. "All three of these coefficients were obtained when intelli- 
gence of the teacher was correlated with student geiins* In. 16 investiga- 
tions (20, 32, 39, 56, 57, 79, 119, 133, 171, 172, 184, 185, 188, 256, 282, 
320) positive correlations with r * *30 or more ara reported between teach- 
ers 1 intelligence test scores anH various criteria of teacher effectiveness. 
The highest relationship, a correlation coefficient of .57 with student 
gains, was reported by Rostker (282) for a group of 28 teachers* (LtiDuke in 
Reference 188 mentioned a coefficient of .61 in the conclusion of his study, 
but no zero-order coefficient of this magnitude appears elsewhere in his 
report. Between a composite measure ©f student gains and teacher intelli- 
gence he found a coefficient of *43 •) Amopg the 55 available studies in 
which correlations are reported between intelligence scores and various 
criteria of teacher effectiveness, the number of subjects is often so small- 
in one instance, in part of Jones' (172), study, as few as six— that the 
correlation coefficients reported have little meaning. 

In Table 16 are shown correlation coefficients obtained between scores 
on the American Council of Education' Psychological Examination and several 
criteria of instructor effectiveness 5 It will “be observed that thb corre- 
lation coefficients reported vary, from -.26 to .57* This would appea^ to 
indicate that whether or not intelligence is an important variable in the 
success of the teacher depends upon the situation. 

In Table 17 appear the 24 studies (8 have 2 entries) in which find- 
ings are given for 90 or more teacher^. The first 18 entries aire con- 
cerned with student-teacher groups. With the exception of the Fyl© (261), 
Breckinridge (52), and Fuller (12>) Investigations most of the stydies 
report a low positive correlation between intelligence' and practice 
teaching grade or rating. The list 14, entries relatd to groups of .teachers 
in the regular school situation. Except for Scmers (320), Kriner (185), 
and Oould (133), these latter investigations appear to show that there is 
ortly 1 slight relationship between the intelligence and rated success of 
a teacher. 

It was noted earlier that student grade or achievement is sometimes 
'negatively related lo the rating of teachers by students and semotimes 
positively, because some teachers may be better for bright students knd 
others for dull students. Similarly, the relationship of instructor In- 
telligence to instructor competence may be positive, negative, or non- 
existent depending upon motivation 1 and ability of students, subject matter, 
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Correlation's Between A.C.E. Psychological Examination and 
Various Measures of Teacher Effectiveness 
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cla&aroom conditions, and other factors. In correlating instructor in- 
telligence with effectiveness, the assumption is implicit that the effect 
of intelligence is constant regardless of time, type of student, nature 
of subject matter, educational objectives, classroom climate, and the 
like. The variety of relationships found by investigators in this area 
provides strong support for questioning this assumption. In somo cases 
too much intelligence on the part of the teacher may constitute somewhat 
of a handicap. This is understandable when one considers the possibility 
that some teachers may not be able to "get down" to the level of the 
student. In a technical school situation this might very well be the 
case, especially where civilians having considerable technical or academic 
training are employed. 

Considering tho more or less restricted range into which the in« 
telligence of a public school teacher may be expected to fall (intelli- 
gence quotients with a range of 10.3 to 126 and an average of 114 as re- 
ported in findings with the Army Alpha 1 ); for all practical purposes this 
variable is of little value as a single predictor of rated teacher success, 
inasmuch as it would be used with a population already selected on the 
basis of intelligence! 

Although no particular relationship is shown between intc .ligenoe vf 
teaohsrs in general and teaching oompetenoe, it is possible that in the 
case of teaohere of more advanced subjeot matter a significant relation- 
ship might be found* The investigations of Knight (176)* Jones ( 171) » 
Boardman (37) • U liman (339> 340), and Jones (172) who worked with high 
eohool teaohere might be expeoted to throw some liaht on the possibility* 
With the exoeption of the correlation reported by Jones (172) who obtained 
a ooeffiolent of -*26 when he correlated intelligence of 19 high school 
teachers with student gains, correlations ranged from *10 i. .45, the 
latter ooeffiolent being obtained by Knight (176), apparently ^ith less 
than 36 subjeots* It is seen that these correlation coefficients tend 
uo be somewhat higher and somewhat less variable than those reported for 
elemontary teachers* 

In 1927 Pyle (260) pointed out "» . . ve find that intelligence as 
determined by various types of psychological experiments is a just-barely- 
perceptible factor in teaching auccess." The studies involving groups of 
teachers of 90 or more which were eunmariaed in Table 17 have largely sup- 
ported this generalisation to the extent that low positive correlations have 
usually been reported* Of 42 product-moment correlation coefficients be- 
tween eons measure of intelligence of the teacher and some criterion of 
teaching success, 37 were positive and rangad frcm tero to *46 while only 
5 were negative, the largest of these latter being -*06* 

Intelligence test eoores are probably of little value as indicators 
of eucoeea or failure with reepeot to teachers of the lower academic 
grades* This is probably due to the narrow rangs of scores involved, the 



*Aray Alpha scores range from 97 to 146 with an average score of 122* 
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teachers from vhorn intelligence test scores have been obtained for research 
purposes constituting a highly selected sample of the total population. In 
some teaching situations the intelligence factor, however, may make some 
contribution when used with measures of other instructor variables as a 
predictive device. If one considers the mean scores of instructors teach- 
ing very diverse subject matter (e.g., calculus vs, trade school) signifi- 
cant differences in intelligence between instructor groups may appear. An 
intelligence tests score below the minimum found for an instructor of cer- 
tain subject matter might well predict lack of success in teaching, for 
instance in the more complex levels of such a field as mathematics. 

In the’ Air Force technical schools there is some indication that in- 
telligence may be somewhat more important as an instructor variable, Morsh 
and Swanson (232) reported a correlation coefficient of 6 (signifi- 
cantly different from zero at the .01 level) between Army General Classi- 
fication Test scores and supervisors' ratings of 38 instructors of recip- 
rocating engine courses on the Instructor Description Form (150* 

The restriction of range of intelligence which may have kept the 
correlation coefficients low when obtained with elementary or high school 
teachers may not occur in the instructor population of the Air Force 
where the range of intelligence may be much greater than that of civilian 
teachers. It may be expected, however, that intelligence will bear a 
differing relationship to teaching success, depending upon the complexity 
of the course material and the level of student aptitude and experience 
compared with that of the instructor. Consequently, great care must be 
taken in generalizing from one course to another. The correlation of in- 
structor intelligence with the criterion of student gains might well be 
quite different for high level courses, such as the weather courses, in 
which the students are highly selected, as compared with a course such as 
sheet metal. 



Education as Related to Instructor Effectiveness 

From 1905 to 1951 somo 26 studies were made of the relation of am.ount 
or kind of education of a teacher to success in the classroom. In 9 of 
these studies statistical relationships between seme criterion of instructor 
efficiency and amount of education were determined. These investigations 
have been summarized in Table 18. 

Results of these studies are difficult to interpret. In the great 
majority of the investigations, the range of education is given but the 
variability in the amount of education for the teachers studied is not 
indicated. As in the case of intelligence the restriction of range in 
the amount of education tends to lower the obtained correlation. Also 
the criterion used in most of these studies is highly suspect and any 
relationship found may primarily reflect contamination in the criterion. 

The two highest correlations were one of ,ti2 found by Knight (178) and one 
of .41 reported by Davis and French (97). In the Knight study the education 
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measure wa3 that of amount of in-service training taken and the relationship 
may only reflect the extent to which raters look with high favor on such 
training. Davis and French compared official ratings reported to a state 
educational department with amount of professional training. Here again 
the raters were probably aware of the amount of training each teacher had, 
and such knowledge may well have influenced their ratings. 

Another source of error in studies comparing amount of edi. ;aticn 
with teaching efficiency is that often the factors of age and years of 
teaching are not held constant. Frequently the teachers with the "poorer” 
educational background as defined in the different studies belong to the 
group of older teachers so that factors other than amount of education 
may be operating to result in their getting a lower rating. 

Some of the studies reported are too old to have much significance 
for present day education. The variables, elementary teaching and col- 
lege education for instance, have changed radically since 1905. The 
studies are of some historical interest, however, and may also be used 
to see if any changes have occurred. It in interesting to note that in 
1905 Keriam (225) said, "Professional work in Normal Schools does not 
contribute as much as one would expect, though Normal School graduates 
do better than teachers in city training schools, and these in turn 
better than teachers with no professional education." Then in 1938 Allen 
(2) in a study of 60 superior and 60 inferior teachers makes the following 
similar statement, "After a relatively high minimal background has been 
reached in such items as are normally stressed in substantial teacher- 
training programs, further addition to these backgrounds are not necessarily 
the things which differentiate superior from inferior teachers." 

In 19M Daniel (91) reported a study in which educational levels of 
teachers rated "excellent" were compared with the percentage of all teach- 
ers of their elate having the same educational level. He asked a large 
sampling of superintendents, supervisors, principals, teachers, pupils, 
and patrons of schools in South Carolina to indicate their "best" teachers. 

In Table 19 is shown the percentage of "best" teachers as indicated by 
pupils and patrons (parents) for the various educational levels and per- 
centages of the teacher population for the state as a whole. Unfortunately, 
these data do not necessarily show that teachers with better education are 
really better teachers. They may have been rated "best" because of their 
education. 

In 1951 Ryans (286) found no significant differences when 275 elementary 
teachers were divided into groups based on amount of college training. The 
criterion of teaching effectiveness was factor scores obtained when ccmposite 
observer rating was factor analysed by the centroid method, The contingency 
coefficient based on 191 cases was ,11, 

Considered as a group the investigations of semester hours or years of 
education as related to instructor efficiency have shoim that any relationship 
that may exist is slight. Results of these studies suggest that further 



Table 39 

Educational Qualifications of '’Best," White, High School Teachers a 



Educational level 


"Best" teachers 
N % 


South Carolina 
teachers 

JL 


High school graduation or less 


1 


0.5 


0.5 


2 years of college 


2 


0.9 


0.3 


3 years of college 


3 


1.5 


0.9 


Bachelor’s degree 


1+5 


21.8 


61.5 


Bachelor’s degree plue 


98 


47.6 


24.4 


Master’s degree 


20 


9.7 


10.2 


Master’s degree plus 


37 


16.0 


2.0 



a Taker. from a study by Daniel (91). 



search along lines followed here for factors which differentiate the effec- 
tive from the ineffective teacher probably not be too rewarding* 

Such variables as "years of education" or "semester hours" lack mean- 
ing unless psychological or educational changes induced in individuals 
undergoing training can be measured. Whether or not a teacher has had a 
course in education i psychology has little significance because of the 
variation in such courses from college to college and even from instructor 
to instructor within a given college. We learn from these studies, what 
we might have suspected from the beginning, that the amounts of education 
or semester hours are meaningless variables in relation to measures of 
teacher effectiveness. Progress in research in this area can be made 
only when more specific and detailed measures of the effects of training 
are developed as variables and substituted for the, gross indications of 
educational achievement used heretofore. More meaningful variables might 
be provided, for example, by using direct measures of the outcomes to be 
expected from given amounts of training of a specific kind such as might 
be associated with child psychology, psychology of learning, or other 
subject matter courses. 

On the basis of what has been reported to date, however, it can only 
be said that beyond certain more or less obvious knowledge requirements, 
greater or lesser education of the teacher in terms of courses or semester 
hours seems to be unimportant . Where any substantial relationship has 
been shown, the possibility of contamination of data has not been eliminated 
since a school administrator *s rating of a teacher may be influenced by 
what he knows about that teacher’s training. There is some suggestion 
from the text of a nvar.ber of articles that the primary motivation for 
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research lay in the educators enthusiasm for some particular course 
or combination of courses in his institution* It is thus perhaps in- 
evitable that some of the results received somewhat less critical inter- 
pretation here than they deserved. 



Scholarship ae Related to Instructor Effectiveness 

In th'e search for variables which mi^ht be usdd as bases for the pre- 
diction of teaching effectiveness, one of the most obviouo indicator.' in 
terms of accessibility and objectivity would appear to be that of previous 
scholarship. The hypothesis is rather widely held thit the individual 
who is himself a good student of mathematics, for instance, can impart his 
mathematical information to others. In line with this assumption, « in Air 
Force technical schools instructors* are frequently selected on the basis 
of grades they obtained in particular subject matter courses. Another 
school of thought maintains that knowledge of subject matter is not as 
important as knowledge of teaching methodology, thus assuming that the 
student teacher who excels in practice' teaching or in courses irfmethods 
will automatically become a good tdaoher. 

In the attempt to relate scholarship to teaching competence two types of 
studies have been made. The first of these involves Ifye investigation of 
academic grades received by student teachers as they are related to stand- 
ing ih practice tfehching. The sdtond type concerns the competence' of teach- 
ers in the school situation as related t.o their earlier scholarship in terns 
of grades received in school or college, including general scholarship, 
standing in academic major, professional education ana methods courses, with 
particular emphasis on grades in practice teaching. 

The usual measure of scholarship is expressed in verms of grade-point 
average or grade-point ratio, which is grade weighted by the ntifcber of 
hours or units credit in the .course. In Tables 20 and 22.,variot>e designa- 
tions used by investigators (general scholarship, marks, average grades, 
honor point ratio, academic average, etc.) have all been interpreted by 
the reviewers as jthe college scholarship variable. 



Practice Teaching Grades versus Scholarshi p 

Many attempts have been made to relate practice teaching grades to 
scholarship in an effort to obtaip sons basis for forecasting success in 
practice teaching* By implication, a good < standing in practice teaching 
would indicate probable success later in the school situation itself. 

Of some 31 studies of teachers in training available to the reviewers, 
23 report correlations obtained between some measvire of average college 
grades and grades or ratings in practice teaching, 16 report correlations 
between standing in spetiifib college courses and practice teaching, and 
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9 report correlations found between high school scholarship and practice 
teaching* The results of these studies are summarized in Table 20* It 
will be noted that tno correlation coefficients shown are all positive, and 
in several instances where comparatively .large groups are involved they are 
quite substantial. There is greater variability in the case of the coeffi- 
cients found when grades in specific courses are compared with practice 
teaohing than when the average for all college courses is so compared. This 
variability probably has little meaning’ due to differences in sizes of 
groups used and in methods of .obtaining the original data. 

The implication is quito clear, however, that grades a student will 
obtain in a praotice teaching course may to some extent be predicted by 
the grades that student obtained in college, ynfortunately, there is no 
indication in the studies reviewed that steps were taken to kfeep the 
measures of practice teaching .experimentally independent and uncontaminated. 
In other words, persons assigning practice teaching grades .were apparently 
not kept unaware of the grades obtained by the students ,in other college 
courses. This moans that the positive correlations in Table ~9 may be 
attributable in part to tho operation of logical error qr halo effect. The 
instructor who grades hiq studont on practice teaching may give higher 
grades to the student herf knows tor have received higher grades in his pre- 
vious college work. Cn the other, hpnd, in the, light of the positive co- 
efficients found regardless of the course on cour'sbs correlated with prac- 
tice teaching gene’ral scholarship may be the determining factor. It is 
probable, too, that both performance in practice teaching and general 
scholarship are related tq intelligence level. The impbrtance of this 
relationship depends, nowever, on the extenti to which practice teaching 
grades predict later success aa a teacher. The, research on this question 
is reviewed in the next section. 

With the exception of one study, Somers' (320), the coefficients re- 
ported for high school standing, thqugh positive, are rather low. From 
this'it Would appear that while tfeme positive relationship is found for 
groups,, little prediction of success ixx practice teaching may be made on 
the basis of an individual's scholastic record in high school. Although 
again the investigators do not State Whether or not the persons assigning 
the praotice teaching 'grades were kept unawaro of the student's high 
school standing,, the probabilities are tnat-halo effect was not precent to 
any great extent here. It is doubtful if, in most college situations, 
college instructors age aware* of their students* hijh school grades. How- 
ever, ;it is also true that there is very little variability in the high 
school grades. of cbllegd students, since the better students tend to go 
on to college* This lattes factci would operate to lower the- dorrelation 
coefficients obtained. 



Scholarship versus Teaching Success An the Field 

The second broad approach in relating 'sbholarship to teaching ability 
is that of coaidering high school dr college records of teachers who are 
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Tebl* 20 



RiUtlon of Prectlei TeaeMng Grade* cr Ratings to Scholarship 





V'jeber of 


High school 


College grede 


Major 


Educational 


Imrestixitor 


student _te*ch*r • 


grades or rank 


evsrexe 


mb.lecy 


method Other courtei 


Keen k Iblley <1916) 


60 


.24 


.19 


.57 


FonJyce (1919) 


123 




.61 


.70 




Whitney (1922) 


700 


.27 


♦39 




.21 


Socwi 4 (1923) 


156 










Cooper (1924) 


107 




•33* 






Hamrin (1927) 


106 




• 65 






Shulls (1926J 


108 


.06 


• 63 






2ant (1926) 


200 








02 .3C (?'y chology) 


Broom (1929) 


H6 








.2) 


Morris (1929) 


60 




• 55 






ULLren (1930) 


116 




*26 


.22 


.46 


Whitney 4 Preeler (1930) 


100 (selected) 




• 47 






70 (control) 




• 52 






Breckinridge (1931) 


420 (beginning course) 


.26 










(adranced courie) 


.12 








Reel k Meed (19)1) 


*4 




07 


,49 




Broom (1932) 


235 (trade) 
232 (grade) 




.58 




.45 




235 (reting) 




09 




.04 


Broom 4 Ault (193?) 


55 (rating public 








.11 


school) 

4fl (retire public 




.44 










school) 












63 (reting college) 
66 (reting college) 




.53 




.22 


Cm# k Cornell ,(1933) 


9M (spprox. ) 


.09 








WxSd (1933) 


90 




05 






Hatcher v!934) 


20 




.25 






Butler (1935) 


2a 




.60 




.23 




118 




♦46 




.43 


Irlner (1935) 


55 


.3) 


.52 






Bent (1937) 


577 


.21 


.46 


•45 


.27 ,29 (English) 


Lawton (193?) 


705 (1932) 




.Mfc 

3 








528 (19)6) 
497 (19)7) 








Vertln (1964) 


123 


.07 


•1.2 






Hull (l945) 


100 




.49 


.45 


♦ 51 06 (Minor subject) 




76 




05 


.u 






67 








•lb (Miner subject) 


Sc ego* (1915) 


25 




.52 




.67 




2) 






• 5> 




Fuller (1946) 


65 


.0) 










5) 






.62 




Schvertt (1950) 


36 




02 




.56 


Bech (1952) 


76 




.6? 




.19 



* Coefficient of mean square contingency 
b College leering exrsl nation 
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now on th? job. Some 49 ouch research studies have been examined. The re- 
sults of these investigations are discussed in the following pages under 
two headings: (l) Practice Teaching Grades versxis Teaching Success in the 

Field and (2) Other Academic Grades versus Teaching Success in the Field. 
The latter section includes general college average;, grades in major sub- 
ject, education courses and other specific college courses;' and high school- 
grades or lank. 



Practice Teaching Grades versus Teaching Success in x - ha Field 

Tn 31 available studies practice teaching grades or ratings yere com- 
pared with some criterion of on-the-job teaching success. In 29 of these 
summarized in Table 21, the correlation coefficients ranged from -.1? to 
.84. The .84 coefficient was obtained by Tudhope (336) in a study of 50 
male teachers in England. This investigator’s data probably, reflect con- 
tamination due to the rating of teachers ‘in service by the same official 
inspectors who participated' in assigning practice teaching grades. 

As indicated in Table 21 with two exceptions, Broom and Ault .(56) 
and Jones (172), all of the available studies reported a positive rela- 
tionship between practice teaching grades and criteria of success in the 
field. Most of the correlation coefficients are low, however, only six 
being .40 or better. 

Upon examination of Table 21, it will be noted that many of the in- 
vestigators used a teacher population of under two years’ experience. It 
might be expected that if ’grade in practice teaching was predictive of 
later success in teaching, a larger correlation would be found in those 
studies with the less experienced teachers. Presumably after about two 
years of experience, a selective factor has entered the picture, the 
failures and -teachers who have not adjusted to the teaching situatioh 
having been eliminated. This hypothesis does not stand 'up • under the re- 
sults as presented in Table 21, however, as many of the studies with in- 
experienced teachers report extremely low correlations. In fact, those 
correlations reported in studies whose ’population included the more ex- 
perienced teachers are equally as high as many reported in studies with 
inexperienced teachers. These results might be partially explained by 
the inadequacy of the criteria used. Tn the great majority of these 
studies some form of administrative rating was employed. Since there 
appears to be a definite tender.ee of administrators tso withhold high 
ratings from beginning teachers their ratings may be forced toward the 
lower end of the scale, thus curtailing the range of the sample studied. 

In only one of’the studies, Seagoe (298), were the teachers ranked rather 
than rated. Seagoe obtained- a correlation coefficient of .49 using the 
criterion of teachers ranked within their own faculty, th« -ranks being 
converted to percentile scores for analysis. In two of the studies of in- 
experienced teachers less fallible criteria were used. Coxe and Cornell 
(87) reported a correlation coefficient of .28 (N = 112) for trained-ob- 
server rating while Lins (203) obtained a coefficient of .25 (N = 58) for 
observer rating and a coefficient, of ,21 (N = 17) for pupil gain when these 
measures were correlated with grades in practice teaching. 
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Table 21 



Relation of Prantiea Teaching Orades «r RatJJl** 


to Teacnirv* Effectiveness In the Field 


--- Invastlgater 


... - Teacher M*p]» 


Keseurm of ef fedlTenese 


Correlation _ 


Kerian (1905) 


1165 elementary 


Komal enhool principal 


.14 






a animation 




Vovij (1918) 


107 MR 


Salary 


.25 




92; woman 


Salary 


.25 


Vhltnay (1922) 


700 with 1 hi . experience 


Supervisor rating 


.24 


Satin (1923) 


110 with 1 yr. experience 


Principal rating 


.70 


Hear in l 1927 ) 


100 with 1 yr. experience 


Supervisor ratings 


.06 (1st critic teacher) 
•23 (2nd critic teacher^ 


Ament rout (1920) 


200 vlth 1 yr, experience 


Superintendent rating 


.29 4 






Superintend ant rating 


.40* 


Pyla (1920) 


99 with 2 yr. experiepce 


Administrator rating 


.15 


Shu It » (1926) 


50 with 2 yr. experience 


Superintendent rating 


.32 


V/genhoret (1930) 


191 with 1 yr. experience 


SuparlntaJHont rating 


.23 


*Ut" (mo) 


90 elementary 


Supervisor rating 


•If 




112 elementary 


Classroom obee: > vir 


.26 


oilmen <mo) 


116 high school, 1 eca. 


Average principal A 


.36 




experience 


auporlntendert rating 




Bossing (1931) 


100 high ech-xil 


Administrator rating 


.69 


Broom (1932) 


230 


Administrator rating 


.26 


Broom & Ault (1932) 


30 to 63 with 1 yr. 


Rating a sent Department 


.02 to .30 




experience 


Education 






29 to 30 with \ yr. 


Ratings sent College 


-.17 to .10 




experience 


Placement 




Coxt 1 Cornell 0?^ 


500 (approx.) elementary. 


Administrator rating 


.13 




1 yr. experience 
400 (approx.) elementary, 


Admin 1 a tr at or rating 


.21 




2 yr. experience 
112 alimentary, 2 yr. experi- 


Composite observer rating 


.20 




ence 






Irina r (1935) 


55 with 1 yr. experience 


Administrator rating 


.39 


Hardssty (1935) 


23) 


Super interdent rating 


.07 


Odenveller (1936) 


560 elementary 


Supervitor rating 


.19 


Iriner (193?) 


42 (4-yr. courea) 1 yr. 


Administrator rating 


.40 




experience 

94 (2-yr. course) 1 yr* 


Administrator rating 


.34 




experience 






Sanifortl, it >1. (1937) 


242 


Composite 7 inspectors 


.35 


Stewart (1740) 


Rural ( number not re- 
ported) 


Superintendent rating 


.21 


Tudhope ( 1942) 


93 with 3 yr. experience 
plus 


Inspector rating 


.61 


1-rlin (19U) 


123 with 1 y. experience 


Superintendent rating 


.10 


Seagoe (1946) 


25 elementary, 2 yr. 


Supervisor ranking (per- 


.49 




experience 


centile) 




Jones (1946) 


52 high acbocl 


Supervisor rating 


-.04 




32 high school 


Pupil gain 


.13 


Lira (1946) 


50 high school women, 1 


Composite rating (5 


.25 




yr. sxperlence 


observer) 




50 high school women, 1 


Student rating 


.06 




yr. experience 
17 high schorl women, 1 


.21 


Pupil gain 




hr. experience 




Could (1947) 


113 with 1 yr. experience 


Principal rating 


.66* 


Stephen a 0 UchVenAtein (1947) 


06 elementary 


Pupil gain 


.01 


SchwarVi (1950) 


10 with 2 yr. experience 


Supervisor rating 


.06 


Bach (1952) 


73 high echool, 1 earn. 


Principal rating (2 dif- 
ferent scales) 


.06 and .20 




experience 








Superintendent rating (2 


.10 and ,12 



air f •rot riUrt, aaao 
Male) 



* Coefflci *rrt of Min equare coctlr^eney. 
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As part of a study concerned with the relation of practice teaching 
success to other measures of teaching ability, Bach (12) in 1952, sought 
an answer to the question, "Is thore any agreement in the factor patterns 
of critic teacher and principal ratings?" A device consisting of 13 items 
arranged on a five-point scale was used. Ratings were made by the critic 
teacher while the student was engaged in practice teaching. After four 
months in an actual teaching situation, ratings were again made by the 
beginning teacher *s principal. As a result of factor analysis four fac- 
tors were found for each of these ratings as follows: For practice 

teaching rating — pupil response, technical competence, relations with 
others, and personal appeal; • for beginning teacher rating — technical com- 
petence, cooperative attitude, initiative, and personal appeal. In con- 
clusion Bach states: 

'There is considerable agreement between two of the four common 
factors found in the analyses of the practice teaching and beginning 
teacher ratings, but there are nonetheless important differences. 

These two factors are interpreted as Technical Competence and Per- 
sonal Appeal. The correlations between these two factors were .27 
for the practice teaching rating and -.02 for the beginning teacher 
rating. High positive relationships are also found between three 
pairs of factors in the practice teaching analysis but only one large 
positive and three small negative relationships are found between the 
factors in the beginning teacher analysis. The above differences lead 
to the conclusion that in spite of the similarity of pame in the two 
■factors common to each analysis, critic teachers and principals are 
emphasizing different characteristics or abilities in the people they 
train and hire, or else they place different values upon and seek dif- 
ferent combinations of the same abilities" (12). 

From the results reported in this section, one .could anticipate that 
research with Air Force personnel might show some relationships between 
standing in instructor training courses and subsequent performance as an 
instructor. If such correlations were shown for Air Force technical train- 
ing school instructors, however, the information would become available 
too late to have much. practical predictive application for the instructor- 
sample used. but might have implications, for future instructor samples. 



Other Academic Grades versus Teaching Success in the Field 

In 35 available studies correlations are reported which are based on 
■scholarship or grades received by teachers while students as compared with 
various- criteria of the effectiveness of teachers in service. These in- 
vestigations have been summarized in Table 22, 

With respect to general college average the correlation coefficients, 
with the exception of 4 studies, Meriam (225), Coxe and Cornell (87), 

Jones (172) and Bach (12), are all positive but range from zero (Broom 
and Ault in Reference 58) to .73 t Somers, Reference 320). For the most 
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Table 22 

Relation of Scholarship to Teaching UfietlrtMii 1b the Flail 






achor tuvU 






H& 5SCiTiS3r 2stal “ ,l!l fcsin5 

isb&l ma it Btisx -tovm. &to£smiiL 



Karlaa (1905) 


ll®5 elementary 

(Ruaibar unreported) 
•leuwntery 


Bormal school principal 
estimation 




-.09 






Hoed/ (1918) 


527 wan 
1C? MO 


Salary 

5U*ry 




.25 

05 






Rittar (191®) 


1436 a 1— antary & tlgh school 


Official rating 




.65 






Wiltro7 (1922) 


700 with 1 iou a parlanca 


Administrator rating 


.09 


.07 






tnight (1922) 


19 elementary 
6 high aehool 

53 elamantmry It high school 


Faar rating 
Tsar rating 
Faar rating 




•13 

.60 

03 






*»aa (1923) 


45 high aehool wjoan 
45 high school WOMB 
44 high school aw 


Supervisor rating 
Supervisor rating 
Suparrlaor rating 




.46J 

.29 1 






So»ra (1923) 


110 vlth 1 yr. experience 


Frlncipal rating 


.77 


.73 






Haarln (192?) 


106 with 1 yr. experience 


Suparrlaor rating 




.05 






XcAfee (1930) 


98 elementary 
112 elementary 


Superintendent rating 
Observer rating 




.15 

.40 






IfllMaa (1930) 


116 high aChOOl, 1 Mi 
experience 


Administrator rating 




OO 


.20 


.30 


tfagenherri. (1930) 


191 with 1 yr. experience 


Ada in iat re tor rating 




#01 






And *r 60 ci (1931) 


4®0 teaching cart If 1 cats 
110 Bachalor Degree 


Superintendent rating 
Superintendent rating 


,10 

.22 


.19 

.21 






Bossing (1931) 


100 high aehool 


Administrator rating 




.17 




#19 


Breckinridge (1931) 


213 


Frlncipal rating 


05 








Irlnar (1931) 


26k elementary It high school 
164 elementary k high school 
36 a! scant ary k high aehool 


Suparrlaor rating 
Supervisor rating 
Suparrlaor rating 


o£ 

,6L° 








Broom (1932) 


240 

237 


Administrator rating 
Administrator rating 




.19 


.19 




Broon k Ault (1932) 


81 with 1 yr. experience 

50 with 1 yr. experience 
46 i yr# experience 


Administrator rating 
(official) 

Administrator rating 
for college placement 




.13 

.00 




.24 

-.06 


Coxa k Corns 11 (1933) 


500 (approx.) alawantary 
1 yr. axparlat.es 
400 (approx.) elementary 
1 yr, experience 
112 elementary 2 yr, expert- 
anea 


Suparrlaor rating 
Suparrlaor rating 
Composite obesrrar rating 


.06 


-tC3‘ 

-.01* 

ocf 






Fatsreoe, at >1. (1934) 


63 

47 to 104 


Saperrlaor rating 
Salary 




.12 
.22 to 
.71 






Irlnar 0935) 


53 with 1 yr, experience 


Suparrlaor rating 


06 


#49 






mnipo (1933) 


173 el*— utery k ^r, high 
school 


Average administrator 
rating 




.19 






Hard* ftp (1934) 


231 


Superintendent rating 




.13 




.09 


Odanwellar (1936) 


5(0 ilamUry 


Administrator 


.06 


.29 




.26 



* Jones (1923) aaed aailar grad a • and Coxa and Cornell (1933) ««*d second aamester achieve— at aa mmvii of eehoUrahlp. 



.*2 (Faychol.) 



b 3 m* 4 on gtadea (r : «^}| |M*d oo atadwti pi* cad eebolaatiemUj 1b approximate top half of clua (j : .t2)j taaad on it«Uota daft* 
aitsly pUctd U top hall of eleea (j * .Q)# 



0 Coefficient of ima ttjaara ooctlngaocy. 



table n (co/it.) 



Investigator 
Krlr.r (1937) 



Sandlford, tt il . (1937) 

Stewart (1940) 

Martin (1944) 

Jones (1946) 



Un* (1946) 



Seagoe (1946) 

Could (1947) 

Stephens It Lichtenstein 
(1947) 



Esp • echade (1948) 

Schwarts (1950) 
Bach (1952) 



* JofWi (1923) used 

* Baaed on grades (j 
nltelp placed in tef hall 

' Coefficient of mi 



0 




Teacher sample 


Hsaiure of effect Irenes 1 


school 


Collige (rads 
average 


Education 
Major courses 


Other courses 

• 48 (Science) 
,25 (English) 
.13 (Social 
Studies) 


42 in 4-yr. courts, 
with 1 yr. experience 


Supervisor rating 


.27 


.45 


.40 


94 in 2-yr. course, 
vlth 1 yr. experience 


Supervisor rating 


.33 


.40 


.33 


.47 (Science) 
.28 (English) 
.22 (Social 
Studies) 


242 


Inspector riling 




.25 


*19 


.20 (English) 



• 1 3 (History) 
,20 (Geog- 
raphy) 

.24 (Special- 
ists) 



193 rural 


Superintendent rating 




.22 








71 rural 


Superintendent rating 


.33 










123 with J yr. experience 


Superintendent rating 


.07 


.15 








54 high school 


Principal rating 






.05 






51 high school 


Principal rating 




.24 








50 high school 


Principal rating 








.40 




43 high school 


Principal rating 


.13 










33 high school 


Pupil gain 






-.08 






32 high school 


Pupil gain 








.26 




30 high school 


Pupil gain 




-.08 








28 high school 


Pupil gain 


-.22 










10 English 


Pupil gain 


-.43 










58 high school women, 


Composite administrator 




.31 


.23 


.29 


.35 (minor 


1 yr* experience 












subject) 


55 high school women, 


Composite administrator 


.33 










1 yr. experience 
50 high school v ocen f 


Pupil evaluation 




.03 


.05 


.13 


.01 (minor 


1 yr. experience 
46 high school women , 
1 yr. experience 
17 high school women, 


.06 


.53 






subject) 


Pupil evaluation 
Pupil gain 


1 yr. experienre 












16 high school women, 


Pupil gain 


.69 




.55 


,52 


•44 (alnor 


1 yr. experience 












subject) 


25 elementary, 2 yr. 


Supervisor rank (per- 




.03 


-.15 


•01 




experience 


cent Hi) 












113 with 1 pr. experience 


Principal rating 




■ 44 C 








86 elementary 


Pupil gain 




.01 






-,'3 (Intro- 



duct Ion to 
tsachlrvg) 

■01 (iducttlon 
pjychology) 

.19 (Hlstorr of 
•ducat Ion) 

•15 (Kathode of 
teaching 
reeding) 



46 physical education, 


Principal rating 


.12 


.24 


1 yr. experience 








18 with 2 yr* experience 


Supervisor rating 


.24 


.02 


70 high *- bool, 1 sea. 


Principal rating (2 


-.01 

-.06 


.09 


experience 


different seal is) 


-.02 


Superintendent rating 


.08 


-.06 




(2 different raters, 

same scale) 


.00 


-.01 



senior grades and Ccxe and Comall (1933) used second semester eshlevement a* measures of scholarship* 

* r .39); based on atudent* placed scholastically in approximate top half of eleu* (r s .62); bated on students deft- 
r of elate (£ ; .81). 

r, square contingency. 
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part the coefficients tend to be low. In only 9 studies were they as great 
as .40 or above, and even within some of these studies great variation is 
shown in the size of coefficients obtained, e.g., Knight (178), Jones (l?l)', 
McAfee (208), Peterson et al. (254) > Lins (203)* While the over-all results 
are not such as to permit any very confident interpretations, it would ap- 
pear that some relationship exists. It may be suspected that the common 
relationship of general intelligence to both academic and teaching success 
is involved. 

In two studies critical ratios rather than correlation coefficients 
were reported. In 1937 Stuit (326) found average college grades of 100 
"superior" teachers as rated by superintendents and principals to be sig- 
nificantly higher than for 46 "poor" teachers (CR 2.8). Shannon (305) who, 
in 1940, compared 111 "highly successful, " 111 "average," and 37 "failing" 
teachers selected from among teachers who were graduated from a state 
teachers college during the period 1898 to 1934* also found success in the 
field to be related to college scholarship (CR's 2.3 to 8.2). 

In 16 of the studies reported in Table 22, investigators attempted to 
determine whether or not teaching effectiveness in the field might be pre- 
dicted from achievement in one or more college courses apart from prac- 
tice teaching. Correlations between field performance of a teacher and 
his grades in specific college courses yielded coefficients which tended 
to be low but positive. In only five investigations, Jones (172), Broom 
and Ault (58), Seagoe (299), Stephens and Lichtenstein (323), and Bach 
(12), are negative coefficients reported, these appearing among positive 
relationships also found in these same studies. The results in the case 
of specific courses appear to be much the same as those obtained when 
practice teaching grade or rating is compared with teaching effectiveness 
in the field. 

The relationship of high school grades or ranks to success in teach- 
ing was studied in 13 of the investigations. As will be seen from Table 
22 the correlation coefficients (except for those reported in the Jones' 

1946 study) are all positive but vary from ,07 to .81. The relatively 
high coefficients reported by Somers (.77), Kriner (.81 and .62), and 
Lins (.69) appear to be somewhat out of line with results obtained by 
other investigators. 

In the great majority of the studies concerned with the relationship 
of scholarship and teaching effectiveness, the question of whether or not 
ratings by administrators wer6 influenced by knowledge of the teachers' 
college scholastic record is not considered. It should be pointed out 
that in the case of supervisors' ratings no investigator could be certain 
just what knowledge might contaminate the criterion nor could this be 
controlled. The question concerning contamination of ratings by knowledge 
of high school grades should also be raised but the probability is remote, 
however, that many supervisors are aware of the high school grades of the 
teachers they rate. 
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Considerable effort has been expended by investigators in attempting 
to discover the relationships existing between on-the-job performance of 
teachers and earlier scholarship as reflected in over-all achievement in 
high school or college or standing obtained in specific college courses. 

Tho outcome of all of this research appears to be that there is some rela- 
tionship but that it is probably small. So far none of these investigators 
has shown that the attainment of a particular standing in high school or 
college or the mastery of any single course or group of courses is essen- 
tial to teaching competence. General college scholarship and scholarship 
in specific college courses are both correlated to some extent with prac- 
tice teaching grades. Intelligence test scores are also correlated with 
practice teaching grades. Investigators have apparently treated subject 
matter knowledge as if it were a discrete variable. Scholarship in specific 
college courses, however, is probably just a less reliable measure of gen- 
eral scholarship, or perhaps somewhat more indirectly, of intelligence. 
Zero-order correlations will not indicate whether subject matter knowledge 
per se is related to teaching effectiveness or whether subject matter 
knowledge, general college scholarship, and intelligence are interrelated 
variables. The lack of any substantial communality of content objectives 
of courses that bear the same title under different instructors or in dif- 
ferent colleges makes it unlikely that a course selected by title only 
will be found essential to teaching competence. 



Age and Experience as Related to Instructor Effectiveness 

The relations of age and of experience to instructor effectiveness 
are reviewed together because of the obviously close relationship between 
these two variables. In 1928 for instance, Bathurst (23) obtained a co- 
efficient of .88 when he correlated them. 

In Table 23 are listed 17 studies in which correlation coefficients 
have been reported. (Bathurst's study is included since he used Knight's 
Professional Aptitude Test not as a measure of ''aptitude'* but as a cri- 
terion of teaching effectiveness.) It will be noted that these coeffi- 
cients range from -.38 to .53* This suggests either that the importance 
of age and experience in teaching effectiveness depends upon the partic- 
ular teaching situation involved or that product-moment correlations pro- 
vide an inadequate indication of any nonlinear relationships that may 
exist. 



That the relationship between age or experience and estimates of 
instructor effectiveness may be curvilinear is suggested by the studies 
of Ruediger and Strayer (283), Young (362, 363)> and Davis (96). Ruediger 
and Strayer, in 1910, used supervisors' estimates of 204 elementary teach- 
ers while Young, as reported in 1937 and 1939> used principals' ratings of 
1521 teachers. These investigators reported improvement in instructor 
effectiveness up to 5 years, no improvement from 5 to 20 years, and some 
decline thereafter. Davis, in 1934* on the basis of an investigation in- 
volving approximately 1700 high school teachers, his criterion being 
pupil success in passing State Board tests, concluded that pupils taught 
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Table 2J 

Aga ard Ixperlenca •» Ralatsd to Teaching Sffactivanasi 



Investigator 


faschsr people 


Age or cxrer ianct 


Measure of uachsr affect .vena#* 


Corralition 


Koriaa (1905) 


387 elewntary 


Cxpsrls.ws (0 to 16 yrs.) 


Supervisor ranking 


.10 


Knight (1922) 


(N\*tar unreported) 


Ags 


fallow taachsr rating 


.u 




elemental 4 high 


Ags 


Supervisor rating 


.03 




school 


Experla.^e 


fallow taachsr rating 


-.04 






Experience 


Supervisor rating 


.14 


Somrra (1923) 


UO with 1 yr. sxpsrlence 


Ags 


Supervisor rating 


.07 


Ung (1924) 


154 •lwarntsry 


Trperlertce (present 


Supervisor rating 


.26 4 .39 






achool only) 






113 high school 


Experience (prsaent 


Supervisor rating 


.46 6 .42 






achool only) 






BcanW (1928) 


88 high school 


Ags 


Coe. posits ranking (supervisor, 


.34 






Experience 


assoclata taachsr 4 pupil 
riling) 


.39 


Barthel»eia 4 B oytr (1928) 


5002 elementary 


Experience (0 to 30 yre») 


Principal ranking 


.27 




1220 Jr. high school 


Experience (0 to 30 yra.) 


Principal ranking 


.36 


Davie 4 french (1928) 


2156 


Experience 


Official rating 


.23 


Bathurst (1928) 


171 high school 


Ags 


Knight Profs salonal Aptitude 
Ts at 


.06 






Exparlancs 


Knight Professional Aptitude 


.15 








Test 








JLgs (experience factored 
out) 

Experience (tga factored 
out) 


Knight Professional Aptitude 

Test 

Knight Professional Aptitude 

Test 


-.15 






.21 


Bathurst { 19-29) 


300 elementary 


Aga 


Knight Professional Aptitude 


-.03 








Test 








Expert «ne<» 


Knight Profs isional Aptitude 
test 

Knight Professional Aptitude 


.06 






Aga (experience factored 


-.37 






out) 


Test 








Exparisnca (aga factored 


Knight Professional Aptitude 


.18 






out) 


Test 




Odanvellar (1929) 


560 elementary 


Ags (18 to f6 yr.) 


Ranking (supervisor, principal, 


.15 






Exparlanca (l to 7 yr.) 


assistant principal) 


.15 


Krlncr (1931) 


262 (131 bast 4 131 


Experience 


Superintendent opinion (elerre'i- 


.10* 




worst) 




tary terchsrs) 








Exparlancs 


Super inteudent opinion (hl.h 


.26* 








school las chars) 


.18* 






Exparlancs 


Superintendent opinion (totol 
group) 


Polfa (3945) 


47 elementary, 1- 4 2- 


Aga (20 to 54 yr.) 


Residual pupil gain (33& *t*- 


.< 1 




rooo 


Experience {1 to 30 yr. ) 


8th grade) 


. 0 


Jonea (1940 


54 high school 


Exparlanca 


Supervisor rating (k ^cons - 


».J? 








M-Blink) 






33 high school 


Exparlanca 


Supervisor rating (Wlecorilr 
M-Blank) 


.04 


Stephana 4 Lichtsn*t*ln (194?) 


40 (spprox.) elemen- 


Aga 


Pupil achievement qnetient 


.41 




tary, normal school 

grads 

23 ( approx.) elraen- 


Exparlanca (0 to 9 yr.) 


Pupil achievement quotient 


.53 




Ag« 


Pupil achievement quotient 


-.38 




tary, city achool 

grads 


Exparlanca (4 to 24 yr.) 


Pupil achievement quotient 


-.a 


Riiach (1949) 


22 elamantsry, city 4 


Aga (20 to 68 yr.) 


Supervisor rating (Viscocsln 


.u 




rural 


H- Blank) 

f jperrisor rating (Wisconsin 


.35 


Experience (l to 43 yr.) 






H-BlerJc) 








Aga (20 to 68 yr.) 


Residual pupil gain 


-.01 




18 elementary, rural 


Exparlanca (1 to 43 yr.) 


Residual pupil gain 


.00 




Aga 


Supervisor rating (Wisconsin 
*. Blank) 

Sups nr 1 so r rating (Wisconsin 
H-Blank) 


.08 






Exparlanca 


.11 


Byuii (1951) 


203 element *ry 


Exparlanca (divided Into 


Composite observer rating 


.a* 






group* of 1 to 4 yr. | 










5 to 9 yr. , 10 or more 

yr.) 







* Pearson co a » £ (oifflelfotii 
** Coefficient of contingency. 
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by teachers with one years’ experience but no better tnan pupils taught 
by teachers with two years of experience. 

In 1929 Birkelo (36) using student ratings of elementary and high 
school teachers apparently showed increased instructor effectiveness 
With age, a result in agreement with that found by Daniel (91) in 1944. 

The significance of these findings as well as those of Ruediger and 
Strayer (283) and Young (362, 363) just mentioned is somewhat doubtful, 
however, since the proportion of each age or experience group in the 
total samples used is unknown. 

In the few attempts to study the relationship between length of 
time teacher was employed in the school and efficiency ratings, higher 
correlations were found, as might be expected. In 1924 Lang (193) re- 
ported correlations ranging from .26 to .46 between supervisory rating 
of teaching efficiency and the teacher's local experience. In 1934 
Davis ( 96 ) in a study of teaching efficiency based on the per cent of 
each teacher's pupils passing state tests in high school subjects stated 
that teachers with longer tenure in a given school were more successful 
in passing pupils through state tests than were teachers vho had been 
employed in the same school for a shorter period of time. However, the 
schools which had the highest percentage of pupilj passing the state 
tests were those schools with markedly high teacher turnover. Because 
of these confusing results Davis concludes, "It would seem more likely 
that the tenure of the teacher is a' result of her success as measured 
by State Board tests than that success in State Board tests is a result 
of increased tenure." In 1945 Brookover (55) found that length of ac- 
quaintance with pupil and length of time teacher had taught in the 
schools, as well as age of teacher, were positively related to pupil 
ratings. 

Several investigators in this area reported no significant differ- 
ences. In 1936 Heilman and Armentrout (148) reported results of ratings 
on the Purdue Scale of 46 college teachers by 2115 students in 50 classes. 
In 'terms of experience teachers were divided into four groups, 7 to 12 
years of experience, 12 to 17, 17 to’27, and 27 or more years of experi- 
ence. Instructors were also divided into age groups by five-year in- 
tervals. No reliable differences in rating scores were found in either 
case. In 1946 Blair (37) compared 92 teachers with less than 10 years of 
experience with 113 teachers with 10 cr more years of experience in terms 
of the number of "poor" answers on the multiple-choice Rorschach test. 

He also compared 107 teachers under 35 years of age with 98 teachers over 
35 years of age. Differences were not significant in either comparison. 

Englehart and Tucker (108), in 1936, asked 224 high school pupils to 
choose their beat and worst teachers and to check their appropriate traits 
on a list. Their findings with respect to age are summarized as follows: 
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Age 


Good 

No. 


teachers 

% 


Poor teachers 
No. ~jj~ 


20* tq 29 


28 


23.9 


27 


25.3 


30 to 39 


68 


58.1 


47 


43.9 


40 to 49 


16 


13.7 


26 


24.3 


50 or above 


5 


4.3 


7 


6.5 



No, significance test with respect to the differences in percentages 
was applied. In 1946 Nemed (244) made a study of a. group of 265 pro- 
bationary teachers who failed to receive certificates at the end of a 
two-year probationary period because of unfavorable supervisory reports. 
When these teachers were divided into two groups (ages 19 to 22 and 23 
years and over) according to the age at which they began teaching, 

Nemec found no differences which were significant at the .05 level. 

Ryans (286) in a factor analysis study of trained observers 1 ratings of 
teachers on the basis of directly observable teacher behaviors found 
that teachers (N = 60) with 1 to 4 years of experience were significantly 
different from teachers (N = 32) with 5 to 9 years of experience at the 
.01 leyel for two factorsj which he named "controlled pupil activity 
and business-like approach" 1 and '‘teacher calm and consistent > liked be- 
cause human," and for the total rating. Differences were significant 
at the .05 level for two factors he called "pupil participation and 
teacher open-mindedness" and "sociability." The teachers with 5 to 
9 years of experience were significantly different from the teachers 
(N = 111) with 10 or more years of experience at the .01 level for 
factors "pupil participation and teacher open-mindedness" and "teacher 
calm and consistent, liked because human" and at the .05 level for 
total rating by the observers. The teachers with 1 to 4 years of ex- 
perience were significantly different from the teachers with 10 or more 
years of experience at the .01 level for "controlled pupil activity 
and business-like approach." 

The research findings of Davis (96), Meriam (225), Ruediger and 
Strayer (283), Ryans (286), and Young (362, 363) imply that teaching 
effectiveness bears a curvilinear relationship to age or experience. 

The zero or near zero correlation coefficients reported by Bathurst 
(23, 24), Jones' (172), Knight (178), Odenweller (247), Riesch (275)> 

Rolfe (280), instead .of showing lack of relationship, probably indicate 
the inapplicability of the Pearson product-moment correlation method to 
the nonrectilihear data .involved. It appears that a teacher's rated* 
effectiveness increases at first rather rapidly with experience and then 
more slowly up to 5 years or beyond. There is then a leveling off and 
the teacher may show little change ih rated performance for the next 15 
or 20 yeafrs, after which, as in most occupations, there tends to be a „ 
decline. It must be borne in mind, however, that ratings .in such studies 
as the foregoing may suffer from the "logical error" which results from 
an implicit assumption that, the young, inexperienced teachers can not 
be as good as those of 5 or more years of experience. 
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In interpreting the alleged decline in teaching effectiveness after 
20 years or more of experience, the effect on ratings of the physical 
and mental changes accompanying aging in general must be considered. It 
is quite conceivable that while the ratings of students and supervisors 
might favor the younger and more vivacious teacher, the real effective- 
ness of teachers in bringing about student changes might not be related 
to age at all. There are as yet, however, no adequate studies of this 
relationship. 

The research findings or. age and experience have some interesting 
and rather important implications for the Air Training Command. In a 
study of the correlates of instructor morale in Air Force technical 
schools, Richey and Berkshire (273) reported percentages with respect 
to experience of 3117 military and 797 civilian instructors as shown 
in Table 24* If more valid techniques eventually confirm the findings 



i. able 24 

Teaching Experience of Military and Civilian Instructors 
In Air Force Technical Schools 3 - 



Experience 


Military 


Civilian 


Less than 6 mos. 


2k. 2% 


5.455 


6 mos. to 1 yr. 


41.3 


11.8 


1 or 2 yr. 


25.4 


18.9 


3 or 6 yr. 


7.0 


17.5 


5 yr. or more 


2.1 


46.4 



a From Richey and Berkshire (273)* 



of previous investigations that an instructor continues to improve for 
the first five years, the great majority of military instructors have not 
reached the period of greatest effectiveness. The present rotation policy 
may be manifestly working against best utilization of instructor poten- 
tiality in Air Force teohnioal sohools in that military personnel are 
not permitted to funotion as instructors long enough for them to aohleve 
maximum effiolenoy. Any interpretation of the results of these studies 
for the military situation, however, must take into aocount thj faot 
that military instructors may repeat the same subjeot matter as many as 
25 times a year as contrasted with publio sohool teaohsrs who repeat 
the same subjeot matter only onoe or twice a year. It thus may well 
be that military instructors reaoh their peak in a shorter period of 
time than publio sohool instructors. 
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K nowledge of Subject Matter. Present Professional Information , 
And Teacher Examination Scores as Related to Instructor 

Effect ivenen 3 



Knowledge of Subject Matter as Related to Instructor Effectiveness 

It is frequently stated that the good teacher is the one ’’who knows 
his stuff," that knowledge of subject matter being taught is the prime 
requisite of teaching success. With respect to this hypothesis the re- 
viewers considered the findings of some 20 studies where various criteria 
of instructor competence were correlated with one or more measures of 
professional information or subject matter knowledge. 

Much variability is evident among the coefficients found when scores 
on subject-matter tests are correlated with criteria of instructor compe- 
tence. As shown in Table 25# these vary from -.69 to .58. It would ap- 
pear that whether or not knowledge of subject matter is related to in- 
structor competence is a function of the particular teaching situation. 
The negative relationships found in some studies suggest that too much 
knowledge on the part of the teacher may result in teaching "over the 
heads" of the students. 

Two minor studies are not included in Table 25 because correlation 
coefficients were not computed. Madsen (213) in 1927, found that in 
terms of scores received on a test of elementary grade subjects, all 
except 1 of 31 teacher failures were found to be in the lowest lOjS of 
a group of teachers studied. Allen (2\ in 1938, using a test that in- 
cluded subject-matter knowledge, reported a low relationship between 
test results and teacher success for a group of 60 very superior and 60 
very inferior teachers as rated by three supervisors. Only language 
usage and spelling significantly differentiated superior from inferior 
teachers. 



Professional Information as Related to Instructor Effectiveness 

On the basis of the nine available studies which have been summarised 
in Table 26, scores on tests of professional information tend to bear 
some slight relationship to several measures of instructor competence, 
With two exceptions, Rolfe (280) and Stephens and Lichtenstein (323), all 
the coefficients are positive. However, only two investigators, Crabbs 
(89), Betts (32), report any coefficients greater than .40. 



National Teacher Examination Scores as Related to Instructor Effectiveness 

Flanagan (112), in 1941, obtained a correlation coefficient of .51 be- 
tween scores on the Common Examination of the National Teacher Examination 
and superintendents’ ratings. He also reports coefficients significant at 
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1*1*1 Ion of $<ons on r'ubJset-Ksttsr full to Htssuros of Instructor IffictlrsMti 



Ini ftmtuf tfffgU7JM»i. . CoroliUM ■ 



**Ul ( 19))) 


61 slomnUry 


Subject-** list vottbultry 


Hpll tsln (roidlnp) 


.Cl to .66 


Cow 1 Com* 11 (15)J> 


500 (tpprox.) tloartUry, 1 p. sip* ri * 


Troislsr tr^ls, 


Syporrlscr rstlni 


.00 




one* 


bhljpfl* 2*«4 &a» 


Supervisor rttlr* 


-.02 




100 (tf-proi.) oloasnlsry, 2 fr, s«j*rl* 


Trsnlsr £n|li»b 


Atyenrlsor rstlni 


.06 




•MS 


whipi* n*od Lr4 


Auporrlsor rtvlni 


.06 




112 iloMnliry, 2 p* oxporlono* 


Tr«ssl*r Eo|lith 


CeiposU* ebssnrsr 


.12 








rstlni 








SMppls ft»*4tn| 


Coof>«*n« ebssrtsr 


.10 








mini 




B*rr, rt *1. (1933) 


66 sloatnUry 


•tW»r4 6rtth«ll« 


Hpii |*U (srtth**tle) 
A.Q. 

Hpil pin (eo*j. *.9. 1 


.12 






Sltnferd 6rltho*lte 


-.03 








rsv |*ln) 








Linford l*ro*sllt 


SusorlMtn&stiV rtUr* 
(coop* 9 *c* l*i) 


.02 


triosr (1935) 


55 sUI 1 p* iipirtiMi 


Cross tr^llsb 


Hprtlsor rollm 


• 50. 




Co^y. tn|Ush 

Coof. Utortry 6cc*\l?iUM# 


Soprvtsof rOJ /4 
Aipo rr\oo^ , rs\Vr4 


.60 






U;. Omul 


Apmlur rttl.nl 


.51 


IrlAsr (1939) 


62 (6-p. t««m) 1 jt. 


Cross tnfliU 


Attjorrisor rstir* 


.31 






Cwf. tn|Ush, IvtsrsWro, 


Kporrlsor rollr| 


.31, .33, .>0 




96 (2-p. toorss) 1 jr- sip*rl«nes 


0»r*rt\ Scisnc- 
Cross tn|lWft 


>*pr, isor rutin* 


.3$ 




Cooo. Lnfllsh, Uimtiiri, 


kf»Msor rstlni 


.21! .15, .21 






Conors/ Scion:* 






R*rtU (1964) 


123 witb 1 p. siporWncs 


toop.r VUsrstor*. fir* 
Aril, feltf**, Soctti 
ttodlsb fethomtUt 
tsothor CollOfo Wlirt 


Avporlntkt)dS£6 rstlni 


.1C, .1% .C) 
•fits .15 






6yporLftt*na«ri\ Mttn| 


.0? 


UWs (1963) 


*? i]»«*fllti 7 , 3- 6 2-rooa 


6*«rle*n Country Cities 1 
OwiflXlt 

9irt*sM*laf«. Nb. /robs. 


Hr VI ttU Umisf.^lp) 


-.a 






Npu Ills (sltisonsMp) 


.01 


>*n: »r ( 1 * 5 ) 


21 ilasaUfTt rurftl 


iNfiew Ceontry Ct*les A 


Nrll ftljR (soclU 


04- 






UbrtmoOM 


stodiss) 








vrlthtstoo* (Rciosrch 


Hpll (tit isoolsl 


*51 






Abllllj) 


stoflUs) 




forsi (1H6) 


60 school 


RtsSlni Coo? robot. t Ism 


Hporoisor r*tlr< 


• 12 




31 Mfh sehool 


Pe*4in| Coopohon»l*A 


Hpll |*lo (ttrioo* 


.13 








Kb>ets) 






1) Mfb school (l^liss) 


CoMpofonsion 


fsyll pin Uullsfc) 


-.69 


lUi (1K6) 


U M.|h School •coon, l fr. Itptrl*** 


Coop>* Co|llS« 


Cooposlt* rsfOrrisoP 


.22 








rstlni 






J? M|6 school wssr, 1 fr. S*pori*r>c* 


Co of. C/llUh 


HrU Polostieo 
Hpll pis (nrlMi 


-.33 




11 6 l|!S school sqh>% 1 fr. *tp*rl*nc l 


Coof , tnflish 


.Cl 






sob>as) 






51 M|k School «cMn, 1 p, stpofit^c* 


Coop. I«s4b4 


Coaposlts toptroloor 
Mill , 

Hpll Sto 1 sit too 


•H 




19 H|h School ‘■■ssn, ) ff. lijsrSirci 


Coop. tssllr* 


■ »3* 




19 Klgb school ocwhi, ) p. STporionc* 


Coop. f**41n| 


H? 11 |«Ln (mists 


.10 








tobjsrts) 




CM) 


13 oil* • p* l^orls^cs 


x>f, *Ulh*«t!es, 


lopo prison ro&sini 


-.C*i * .26, 






tstsrsl Aocisl 

ItoSlil, CeriiHpsrtij 
Afftirt 


.13, *12 






Os«U UH9) 


113 rUS ) p, i^sMsmi 


Coop. CW.soporsry 


/rihtlpl rstlni 


•31* 






6/fslrs 






JtsjfOM 1 lleMsnstolo (1969) 


20 Slmorttry 


AHVhMtU MhMrUH, 


Pupil |tls UHthaotie) 


-.53, ..06 


LrUWtlt lossordht 
2»s4Ln| CWysfcSMlOl 


Hot) |tl« 


-.61, -*l<6-,20 










IntliSh C**|*, 3o*ftor*« 
It factor* - 










•pm H 


Hr|l plr 


-.11 



* litln.n *f 1* Km t-cboo 1 1 • 

4 CooffUtOrt if mm Ifon e*M lr*Or<f 
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Table 26 



Relation wf Score! on Profaaalor.al InfornntSon Tull to Keaiurei of Instructor I/fa ctlveneta 



/ 



InvntUHof 


tMchtr itnli 


Tut 


Keaaura of effactivantei 


C?rra:*iU- 


Crabha (1925) 


(tvi eber unreported) 


Steele-Herrlrg 


Pupil gain 


.0$ 


« latent ary 




Supervisor rejvin* 


.41 


VartMn ( 1920) 


M Ugh achool 


Frofeailonal Information (unpub- 


Composite rank (auper- 


• 24 




liahH) 

Procedural 


vieor, teacher, pupil) 


.26 




UllMLA (19X>) 


116 Mgh echool, 1 Mi, 


Odell (principle! of teachirg) 


Average rurerlr. tend eat 4 
principal rating 


.14 




experience 








Weber (object l*ee of teaching) 


a re rag a mparLhteMtr.t 4 


.09 








principal rating 




Nil* (193)) 


61 elementary 


Profeiiional InforMtion (coftpoe- 


Pupil gain 


.10 to .46 


lta 16 teiti) 




fcarr, tv H. (19)5) 


66 flerentery 


Torgeraon Frofaiaional InforM- 


Pupil gain A.Q. 


.2) 






tloo 










Torgeraon Profeialenal Inf c na- 
tion 


Pupil gain (coapoetu 
A.Q. 4 raw gain). 


.<* 






Torgeraon frofaaiional InforM- 
tloe ' 


R\iterlM»ad*nt rating 
ieoapoaite 7 scales) 


.16 


Xartifl (2944) 


12) arHh 1 fT, experience 


Teacher 'Col leg# IleMbUry 


Stpe mint ardent ratify 


.02 


*«if* (mj) 


47 elaMntery, 1- k 


Uvareni-Steinaet! (education 


Pupil gain 


-.06 


i-trca 


orientation) 




UHUt (1945) 


21 elementary, rural 


Uv*rtn*-$telroeit (education 
orientation) 


F-pil gain 


.x> 


1 Liehtenateia (194?) 


)5-42 elanentery (nor- 


Professional ao-einatlon 


Pupil gain 


-.11 


Ml eehool) 










il-H eleaentery (illy 
inlnlng echool) 


frofenlonal axaklnatioa 


Pupil gain 


-.49 



* Let Ws l to of l- &rd Kr>« kWl». 

the .05. level, between total scores on the Common Examination of the 
National Teacher Examination and the proportion of students reporting 
the' particular teacher's name, in response to the questions '.'Which 
teachers seemed to hav.e ’a broad knowledge of other subjects besides 
the.*one you, had with them? 11 On the. ot^er hand, when Lins (203), in 
1946, correlated National Teachers Examination scopes* with pupil, 
evaluation of. their teachers he .obtained a correlation coefficient 
of -.30 significant at the .01. level of confidence. When Lint, used a 
composite gain criterion he found a coefficient of *45. The latter 
figure,- however, is probably not significant since only seven teachers 
were involved. 

In 1?51 Kyans (286) correlated scores obtained by 192 elementary 
and -I65 secondary teachers. on the General Principled and Kethods of 
Teaching test 01 the 1949 National Teachers Examdnat ion- Battery with 
two kinds of rat ings made by principals. For the’ elementary teachers 
the correlation coefficients obtained between examination scores, and 
principals' ratings on an observation blank was .17, and between exam- 
ination scores and principals’ ratings of over-all. effectiveness,. .23. 

The corresponding coefficients for the secondary teachers were .13 and 
.15. The principals' ratings on the two blanks correlated • .83 for both 
groups of teachers. When an analysis was. made of. examination scores ob-, 
tained by the upper, and lower of the teachers, differences significant 
at the .01 level were obtained with respect to 52 :, h.igh".and 5? "low" 
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elementary teachers, but the differences were not significant at the .05 
level for the U5 ’’high” and 1+5 l, low n secondary teachers. 

Despite the more or less unpromising results that have been reported 
by investigators of the relationship of professional information and 
knowledge of subject matter to instructor effectiveness, this might still 
be a field in which useful research work can be done. The restriction in 
r?nge of information of elementary school teachers might account for some 
of:the lovJ correlation coefficients. It is possible, too, that the par- 
ticular subject matter* involved may be a factor in determining the rela- 
tionship between an instructor^ competence and his knowledge of subject 
matter and/or professional information. It appears that in teaching cer- 
tain. technical school subjects, at least, 'the amount of techhioal iriforma- 
tion possessed by the instructor may be important. Morsh and Swanson (232) 
in a small exploratory study found a correlation coefficient of .1+5 (sig- 
nificantly different from zero at the .01 level of confidence) between 
power plant "proficiency examination scores and supervisors* ratings of 73 
instructbrs on a, forced- choice form. 

An Air Force technical school instructor must possess a certain mini- 
mum of technical information. He must be familiar with certain factSj 
must possess the requisite skills, and must understand the. procedures 
involved ih t ! ,e specialty he is teaching in- <Jrder to iippart these facts, 
skills, and techniques to his students! The differential between iristruc- 
tors* knowledge as compared with that of their studejvte is also an im- 
portant consideration. Th<f instructor with wide experience and background 
or technical information which goes far beyond that of hia students may 
have.' e same difficulty as that of the overly, intelligent* instructor 'in 
communicating at the student leVel* On the other hand, an instructor' 
who has the bare minimum of the khbwledge requirements may be put in an 
embarrassing position or may 1 actually lose the respect of older, experi- 
enced students who know more than the instructor about the subject at 
hand. The extent aod implications of the differences between subject- 
matter knowledge of instructors and the knowledge of -hair students may 
vary from course to course in ways only to be determined through investi- 
gation. 



Extracurricular* Activities and General Culture Test Scores 
Versus Instructor Effectiveness 



Extracurricular Activities 

There is rAther widespread belief among scliool administrators that a 
teacher whojias taken part in°ncW.vities outside the classroom in high 
school or college thereby becomes a more rounded person and makes a bet- 
ter teacher. In two investigations (29 2, 3^5) critical ratios were com- 
puted between teaching effectiveness of groups of teachers who as students 
had participated in extraclass room activities as compared with teachers 
who had been nonparticipants. 
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Sandiford et al. (292), in 1937, compared the top and bottom third of 
the group when 336 student teachers were ranked according to teaching 
grades • Significant critical ratios favoring the top third were found 
v/ith respect to several extracurricular activities. In terms of number 
of extracurricular participations', Shannon (305), in 1940, reported sig- 
nificant critical ratios when 86 most successful men teachers were com- 
pared with 24 failures and when 111 most successful men and women teachers 
were compared with 37 failures. 

Since the less able student cannot keep up with his studies if he 
participates and hence refrains from participation or is not allowed to 
participate in extracurricular activities, it is necessary to partial out 
scholastic ability if the relationships found by Sandiford, Shannon, and 
others are to be attributed to the student's becoming a "more rounded 
person." As they stand, these results merely reflect the tendency for 
the brighter students both to get higher grades in all college subjects 
(including student teaching) and to participate more in extracurricular 
activities. 

Several investigators (171, 182, 185, 196, 218, 298, 299, 319, 320, 
324, 344) havo reported correlations found between teache:.' or student 
teacher participation in extracurricular activities and ratings of teach- 
ing effectiveness. As will be seen from Table 27, in the nine studies 
of teachers on the Job the correlation coefficients range from -.06 to 
•46. In general, investigators found low jiositive relationships between 
extracurricular activity and instructor effectiveness. On the basis of 
the results of the studies reviewed, there appears to be slight justifi- 
cation for further search for selection or evaluation measures in terms 
of the amount of extracurricular participation of a teacher while a stu- 
dent in high school or college. 



General Culture Test Scores 



Six investigators attempted to correlate scores on the Cooperative 
General Culture Test with measures of teacher competence.. The results 
are markedly inconsistent, with a rather strong negative relationship 
being indicated in several instances. These studies are summarised in 
Table 28. In addition to the studies reported in Table 28, several in- 
vestigators (125, 161, 184 , 218, .298) correlated total scores on the 
Cooperative General Culture Test with student teaching grades. Correla- 
tion coefficients obtained ranged from -.02 in the Seagoe study (298) 
with 31 student teachers to .21 in the Kriner study (184) with 55 stu- 
dent teachers. The studies reviewed appear to indicate that the relations 
of Cooperative General Culture Test scores to instructor effectiveness dif- 
fer little from those reported for other subject matter tests. 
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Relation of Extracurricular Activities to Instructor Effectiveness 



8 t 

P *H 

A A 

0 SH 

B 8 

O O 



8 « & 



H 



0 

vr\ 

OJ 



ITS 

CVJ 



CM 



*S 



P 
0 ) o 
0 0 

0 



III 



P 

2 

b 

© 

•H 

> 



p 

2 

b 

© 

t 



I I I 

WWW 



p 

CO 



s. 

v 4 

o 

a 



II 

P P 
<0 CO 

^ u 

bb 
© © 
*H *rC 
> > 
u u 

II 

CO to 



i 

P 

0 

M 

£ 

0 

*2 

I 

I 

to 



I I 



p 
0 
M 

b 

to 

t £ 

I I 

to to 



p 

0 

M 

b 

CO 



s 












p 












p 




h 








> 

*rt 


0 

rH 


41 

H 


A 


5 




P 


0 


3 


d 




55 


O 


O 


O 


u 


U 


O 


0 




■rl 












t 


2 


fc 


t 


o 


9 


9 


9 


9 


9 




o 


o 


u 


u 


o 


V 


0 


0 


0 


0 


0 


fi 


b 


e 


b 


R 


£ 


e-< 


X 


X 


X 


m 


X 




w 


M 


w 


w 


W 



0 

o 

•H 

% 



£ 

p 



0 



I *• P 
0 w O 
P > C 
© 

4; *t> 
'U O ■H 
H H Xl 

2 

o °S 
# p 

> « £ 
0 o 

OJ 



r 



& 

■3 

U 



M 

I 



o u 
& * 



& 



b 

V 

w 



o 

0 

£ 



M 

8 

.a 

o 

© 



I 

8_ 

■8h 
o cj 



■&§ "SS ■§>§ 

ShShSh 

v_^ O 

^ iTN jf 



0 

% 

H 

^ *> 

*« 

H 



■a 

«H 

aS 

>> 

5 



iH ,0 
0 O 
0 

OJ 

f- 



P 

0 

P 

3 

*0 

SJ 




I 

a 



p 

0 

0 

& 

M 



& 



KS 

& 

& 



0 



fO 

C> 



8 



to 



* ir\ 

cr po 
M Os 

*rS 

to 



Os 



s 



89 




Stewart (19^0) 1^5 rural Extracurricular Superintendent rating 
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17 high school women. High school extra- Pupil gain 



Table 28 



Relatiori of Scores on the Cooperative General Culture Test 
TO Measures of Instructor Effectiveness 



Investigator 


Teacher sample 


Measure of 
effectiveness 


Correlation 

coefficient 


Kriner (1935) 


55 with 1 yr. experi- 
ence 


Supervisor rating 


.30 


Kririer (1937) 


94 (2-yr, course) 1 
year experience 


Supervisor rating 


.25 




42 (4-yr. course) 1 
year experience , 


Supervisor rating 


.22 


Martin (1940 


123 with 1 vr. expedi- 
ence 


Superintendent 

rating 


.11 


Seagoe (1946) 


25 elementary, 2 yr . 
experience 


Supervisor rank- 
ing 


-.01 


Jones (1946) 


50 high school 


Principal rating 


.03 




30 high school 


Pupil gain 


-.23 




13 English 


Pupil gain 


-.58 


Lins (1946) 


57 high school women p 
1 yr. experience 


Composite super- 
visor rating 


.05 




50 high school women, 
1 yr. experience 


Pupil evaluation 


-.34 




17 high school women, 
1 yr. experience 


Pupil gain 


.23 



Socioeconomic Status. Sex, and Marital Status 
Versus Instructor Effectiveness 

Socioeconomic Status of Instructor 

In 1930 Ullroan (339) » in an attempt to predict teaching success, 
among other measures used the Sims Score Card to determine socioeconomic 
status of 116 junior and senior high school teachers with one semester 
experience* Near aero coefficients resulted when socioeconomic status 
scores were correlated with social intelligence, general intelligence, 
knowledge of principles of teaching, knowledge of aims of secondary educa 
tion, self-rating, academic marks, education marks, major subject marks, 
and practice teaching rating. In the case of teaching interest, as 
measured by the Strong Interest Blank, a coefficient of -.25 was obtained 
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This negative relationship appears reasonable considering the low salaries 
of teachers and the opportunity for individuals of high socioeconomic status 
to enter professions requiring more costly preparation but there is no 
reason why economic status should be related to the other variables. The 
correlation between socioeconomic status and rated success in the field was 
.19. Any such low positive coefficient may mean only that supervisors are 
influenced somewhat by the socioeconomic standing of their teachers. It 
could mean, too, that persons from the higher socioeconomic group do make 
better teachers because of greater social poise. 

Kriner ( 1 82 ) , in 1931* made a study of 131 best and 131 poorest teach- 
ers within a school system as judged by superintendents. He found that 
high school teachers who came from a rural area and whose fathers were 
farmers and elementary school teachers who came from urban communities and 
whose fathers were businessmen had the best chance for teacher success. 
Either type of teacher, especially the elementary, was handicapped if 
their fathers were artisans and especially handicapped if their fathers 
were laborers. To enter the teaching profession because of financial 
reasons or compulsion predicted substantially against teaching success. 

Size of family affected teacher success probably as a by-product of finan- 
cial reasons. Travel and past illness had little if any relationship to 
teacher success. Kriner’s results are probably not specific with teachers. 
They may simply be demonstrating the truism that those from the higher 
status groups have greater probabilities of success in life than those 
less fortunate. 

Phillips (256) secured ratings by superintendents and principals of 
173 elementary and junior high school teachers. He also administered the 
Sims Socio-Economic Scale to the same 'group. The resulting correlation 
coefficient between these measures was .05* When the ratings were con- 
verted to sigma scores, Phillips reports a correlation of .22 for the 
entire group and a critical ratio of 3*5 for two groups of i»3 teachers 
each standing at the extremes of teaching ability as rated administratively. 

Rolfe (280) compute! correlations between achievement in citizenship 
of 338 seventh and eighth grade pupils from one- and two-room rural schools 
and various measures of their hi teachers. He reported a correlation co- 
efficient of -.15 between the teachers* Sims Socio-Economic Status scores 
and pupils gains. 

The results obtained with the Sims Socio-Economic Scale, like those 
found with the Cooperative General Culture Test, seem to pro v ide .little 
incentive for further research in this area. 

With the exception of Rolfo’s (280) study the criterion used in tneso 
studies was supervisory ratings, which are often negatively correlated with 
student gain. It is possible that with other criteria and with other 
hypotheses involving socioeconomic status of teachers research of more 
probable product!* r less might be undertaken. Socioeconomic status of the 
teacher is probably not of significance in itself but only as it might 
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be reflected in various '’psycho logical" dimensions of teachers. For in- 
stance, if extreme upward social motility has characterized a given 
teacher and this motility has resulted in insecurity and anxiety on the 
teacher’s part, this might in turn be reflected in the teacher’s pattern 
of classroom behavior or in the adjustments the teacher makes to admin- 
istrative personnel, fellow teachers, and pupils. Instead of looking for 
people who have exhibited this social motility, or who possess a certain 
socioeconomic status, investigation might be directed toward the mani- 
fest degree of anxiety or insecurity. 



Sex of Instructor as Related to Instructor Effectiveness 



In Table 29 are summarized the ten available studies in which sex of 
instructors was related to instructor effectiveness. It will be noted 
that Icriteria of effectiveness employed included student ratings, student 
designation ufbest teacher, average class marks, administrative ratings, 
and success or failure on the job. One investigator used three criteria: 
pupil gain, pupil ratings, and administrative ratings. Six of these stud- 
ies appear to favor women, three show no differences between effectiveness 
of men 'and women, and two studies favor men. Ir. studies conducted prior 
to 19A0, in no instance apparently was the significance of the obtained 
difference between teaching effectiveness of men and women teachers tested. 
In the four later investigations significance was determined but in only 
one study, that of Cheydleur (75), was a significant difference found, a 
critical ratio of 6.6 being reported in favor of women instructors. 

As indicated by Uhe foregoing studies the question as to whether or 
'not women teachers are superior to men teachers has been considered for 
some years. The problem may not be merely one of academic interest out 
may have practical or economic implications for some school and college 
administrations. No particular differences have been shown when the 
relative effectiveness of men and women teachers has been compared. In 
view of the results found, it may well be that consideration should be 
given to assessing the effectiveness of women instructors in Air Force 
technical schools. In case of full scale mobilization women, both 
civilian and WAF, would seem to offer an invaluable potential sour .e of 
instructional personnel. Employment of greater numbers of women instruc- 
tors than at present would release like numbers of technical specialists 
who would then be available for combat support in their specialty. 



The Relation of Marital Status to Instructo r Effectiveness 

While in some parts of the country there has been considerable opp>o- 
sition, generally for economic reasons, to the holding of teaching posi- 
tion^ by married women, there appears to be little evidence that frarried 
teachers are in any way inferior to unmarried teachers. The reviewers 
found only three investigators who had made any objective study of the 
question. Ir. 193** Peters (253) conducted a rather comprehensive study 
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of the status of the .Harried woman teacher. He matched according to age, 
education, teaching situation, and so on, 110 married with 110 single 
elementary school teachers and compared the gain of 2195 pupils of the 
former group with that of 2250 pupils of the latter group. Supervisory 
ratings (made by superintendents or principals) were obtained for 1123 
married teachers and 1123 single teachers matched on the same variables 
as above. Differences in achievement and mental growth of pupils of the 
married women teachers as compared with the single teachers as shown by 
scores on the Otis Classification Test Parts I (achievement) and II 
(mental growth) were .86 ± .29 and .60 * .23* respectively. These dif- 
ferences in favor of the pupils of the married teachers were just under 
three times the probable error of the differences or on the border line 
of being significant. Differences in supervisory ratings of married and 
unmarried teachers were too small to be significant. 

In 1951 Ryans (286) compared 99 single women with 107 married women 
third and fourth grade teachers with respect to ratings made by trained 
observers. Dimensions observed included 20 items relating to directly 
observable teacher behavior and 6 items referring to pupil behavior. 
Comparison of mean criterion scores with respect to marital status revealed 
no differences that were significant at or near the .05 level of confi- 
dence. When the relation of marital status to pupil behavior alone was 
studied for the 206 teachers, a coefficient of mean square contingency 
of .11 was obtained. 



The Relation of Teaching Aptitude, Attitude Toward Teaching , 

And Interest to Instructor Effectiveness 

Teaching Aptitud e versus Instructor Effectiveness 

The results of the ten investigators using several measures designed 
to predict teaching ability show great disparity. In Table 30 entries 
have been arranged according to teaching aptitude test instead of chron- 
ologically in-order to improve comparability of studies. As will be seen 
from Table 30, correlation coefficients between various criteria of effec- 
tiveness and the Knight aptitude test ranged from -.10 to .78, the largest 
being reported by Cooke using nine teacher subjects. The Morris Trait 
Index-L test, apparently devised to indicate leadership aspects of teach- 
ing aptitude, gave correlation coefficients between scores on this test 
and various criteria of teaching competence from -.17 to .23* In the 
case of the Coxe-Orleans Aptitude Test the range of coefficients with 
various criteria of teaching efficiency was -.32 to .51. Dc’.d (100) 
suggests that the Coxe-Orleans test measures qualities related to general 
scholarship rather than to teaching success as revealed by supervisors 1 
ratings of practice teaching. The range for che Stanford aptitude test 
was -.15 to .14* For the George Washington University Aptitude Test a 
coefficient of -.19 was reported by Seagoe (299)* 
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Tabla 30 

Relation of Scoraa on Measures of Teaching Aptitude to Teaching Effectiveness 



Investigator 


Taacher sample 


Measure of effect ivenese 


Correction 


Knight (1925) 


33 elementary 


fellow teacher rating 


Knight Apt ltude-E lament ary 
.45 




7 high school 


Fexlow teacher rating 


.15 




33 elementary 


Super* leor rating 


.77 




7 high school 


Supervisor rating 


.00 


Tiaga ( 1925) 


25 elementary, 1 sex. experl- 


Supervisor rating 


.02 


Bathurst (1929) 


anca 

(humber unreported ) 


Administrator rating 


.50 


Urr, «t »1. ( 19 ) 5 ) 


eleaentary 
66 elementary 


Pupil gain A.Q. 


-.01 


Cooke (1937) 


27, 18, 9 elementary A high 


Pupil gain (composite a«Q. 

4 raw gain) 

Superintendant rating (com- 
posite 7 scales) 

Self-rating 


-.10 

.26 

.21, .22, .36 




echool 


Supervisor rating 


.32, .12, .76 


Barr, at al, (1935) 


66 elementary 


Pupil gain A.Q. 


Hr it is Trait Index-L 

-.11 


Philllpa (1935) 


173 elementary 4 Jr. high school 


Pupil gain (composite A.Q. 

4 raw gain) 

Superintendent rating (com- 
posite 7 seals*) 

Superintendent & principal 


-.04 

.06 

.20 


Rolfs (1945) 


47 elementary, 1- A 2- room 


rating 

S. tjaa score rating 
Pupil gain 


«23 

-.17 


Rostksr (1945) 


28 elements ry, rural 1 


Pupil gain (social studies) 


.20 


Seigos (1946) 


25 eleaer-tery, 2 yr , experience 


Administrator rating 


.00 


Coxa <4 Cornell (1933) 


500 (approx.) elementary, 1 yr. 


Supervisor rating 


Coxs-Orloane Aptitude 

-.03 




experience 

400 (approx.) elementary, 2 yr. 


Supervisor rating 


.03 




experience 

112 elementary, 2 yr. experience 


Composite observer rating 


.06 


Phillips (1935) 


173 elementary 4 Jr. high 


Average superintendent rat- 


.16 


Cook# (1937) 


echool 

9-46 elementary 6 high echool 


• ing 

Sigma score rating 
Self-rating 


.26 

-.32 to .04 


Seagoe (1946) 


25 elementary, 2 yr. experience 


Supervisor rating 
Supervisor ranking 


-.12 to .51 
.01 


Boatkar (1945) 


26 elementary, rural 4 


Pupil gain 


Stanford Aptitude 
(3 fubteste) 

.02, .04, .10 


Bo If# (1945) 


47 elementary, 1- 6 2-room 


Pupil gain 


,15, -.13, .06 


Ssagoa (1946) 


25 elementary, 2 yr. experience 


Supervisor ranking 


.02, .04, .14 


Ssagoe (1946) 


25 elementary, 2 yr* experience 


Supervisor ranking 


Teorge Washington 
University Aptltuda 

• •19 
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Xxelaaire of 1- and 2-room schools 



In 1952 Jarecke (165) made an initial report of a teaching judgment 
test he had devised which follows a somewhat different pattern from other 
tests of this type. Jarecke *s instrument is a situational test of a 
forced-choice ranking type. A list of problem situations typical in the 
daily life of a teacher is presented. There are five alternate solutions 
offered for each situation. Solutions are to be ranked in order of favor- 
ableness. All solutions are of the nonoptimum or poor type on the theory 
that good teachers could discriminate between varying degrees of poor alter- 
natives while poor teachers would tend to rank higher the one they them- 
selves might employ. Jarecke reports very high correlations (.68 to .93) 
when scores on the teaching judgment test were correlated with various 
criteria of teaching effectiveness. Unfortunately, however, these re- 
ported correlations are spuriously high because the population on which 
they were obtained included the population on which the scoring key was 
based, thus making it difficult to evaluate the test on the basis of pres- 
ent data. 

At first glance it might appear informative to examine the factors 
that have been considered worth including in tests of teaching aptitude 
together with the underlying rationale and implicit hypotheses. The re- 
viewers are of the opinion, however, that rationale or hypotheses or the 
methods used to implement them have been inadequate. If one knew what 
kinds of things were important to instructor effectiveness and were able 
to construct devices for measuring both the instructors* knowledge of 
these things and the probability of their shaping their behavior in ac- 
cordance with them in an instructional situation, the use of aptitude 
tests would seem to be a reasonable approach. 

It may be that there is a specific aptitude for teaching which is 
related to effectiveness of teacher performance. Data thus far avail- 
able, however, either fail to establish the existence of any such apti- 
tude with any degree of certainty or indicate that the tests used were 
inappropriate to its measurement. 



Teaching Attitude versus Instructor Effectiveness 

Attitude toward teachers and teaching, as indicated by the Yeager 
Scale devised for its measurement, appears to bear a small but positive 
relationship to teacher success measured in terms of pupil gains. Rolfe 
(280) administered a battery of tests to 47 rural teachers. He reported 
a correlation coefficient of .22 between pupil achievement in citizen- 
ship and teacher scores on the Yeager Scale. He also found a coefficient 
of .38 between this success criterion and teachers* scores on the Hartmann 
Social Attitude Test. With 28 teachers as subjects, Rostker (282) re- 
ported a coefficient of .45 between teachers* Yeager scores and measurable 
changes produced in their pupils in social studies. LaDuke (188), who 
correlated scores of 31 rural teachers on the Yeager test with ''objective'* 
tests of pupil gain in attention, appreciation, information, interest, and 
a composite of these, found coefficients ranging from zero to .20. 



Inte rest in Teac hing versus Instru ctor. Ef fectiveness 

Operationally, interest in teaching may be quite different from atti- 
tude? toward teachers or teaihing; That an effective teacher should be > 
interested in teaching would appear to be so obvious as to be axiomatic. 

A fewi investigators have attempted to show that among successful teachers 
interest! An teaching developed during, the teachers’ secondary school peri- 
od' or before. In 1*he rcajQrity ..bf investigations, however, interest in 
teaching v/as measured t!y interest test scores which indicate similarity 
of interests of teachers apd persons undergoing the interest test. The 
results of these studies are shown in table .31* 

As will be Seen from Table 31 those correlations resulting' from the 
use of the Strong interest test or modifications of.it arid- thp test used 
by Coxe and Cornell (87) all tend to cluste'r around zero. Th6 Link Ac- 
tivities and Interest Inventory on the other hand shows such inconsisten- 
cies in the light of the .rather sp'-rse data available as to render it also 
of somewhat dbubtful value. 

The Kriner (18?.) study which produced suet] high correlations was based 
on recall by the teachers as to their interests when they were'in high 
school. Obviously there is no 'way of keeping such opinions free from the 
influence of later experience of success or failure, thus making the cor- 
relations obtained practically meaningless. The Lins >(203) investigation, 
on the other, hand, was 'a follow-up study. Students listed their choices 
as to occupations when they first entered college ar.d these choices were 
Correlated against rating received some years later. 

In 1952 Ringness (277) reports a study in which he attempted: (h) to dis- 
cover, if possible, any common factors that may underlie the reasons given 
by undergraduates for the choice of teaching as a profession; (2) to de- 
termine Whether , the ‘anSWers given to essentia-lCLy ; the saijne questions in two 
diff^reht types ‘6f testing devices reveal comparable' data; and (3) to in- 
vestigate the relationship between the reasons given for choide of profes- 
sion 'and subsequent iteachihg ‘success ad measured by criteria of efficiency 
and acceptability. A paired-comparison and a ranking questionnaire were 
used to determine the reasons for choice of teaching as a career. Data 
were analyzed ^y the' centroid method of factor analysis to find the com- 
mon factors; Sijjtyi-three'men and 37 women sthdent teachers comprised 
the sample Used in Parts' One and ,'Two of the s'tudy, and 16 ihen and 1^ 
women with one-year experience were used in the last part of the study. 
Criterion of teaching success wSs an ’’acceptability 1 ’ rating by the super- 
intendent. This was an over-all rating hiat^e after an interview of the 
superintendent by the investigator in which questions were asked Which 
related not only to teaching efficiency but also to personality ora'its of 
many kinds. In the factor analysis study factors identified' as interests 
in working 'conditions; in people, in security, and in subject matter area 
to be taughl seemed to be generally emphasized. Desire for professional 
advancement did not appear to be a general characteristic of the ; factor 
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structure, nor did desire for service to sopiety or prestige and respect 
of the profession. Factors bearing similar labels were found in analyzing 
the results for men and women. However, these factors were only broadly 
alike and had somewhat different arrangements and loadings of the vari- 
ables. An interest in "security," for example, as interpreted from the 
men’s data is not precisely that interpreted from the women’s data. Cor- 
relations between reasons for choice of teaching as a profession and ac- 
ceptability ratings differed slightly between the men and women subjects. 
Items which had a c rrelation of .30 or higher, in either the paired-com- 
parison or ranking questionnaire, for the women were: "relatively good 

financial reward," "ease of getting a position," "clean, attractive 
physical surroundings," "short working hours," "frequent vacations," and 
"environment of interesting co-workers." Items which had a correlation 
of .30 or higher for the men were: "security against job loss and layoffs," 

"clean, attractive physical surroundings," "opportunity for professional ad- 
vancement," "opportunity to serve society," "ease of getting a position," 
and "opportunity to pursue a favorite interest." Multiple-correlation co- 
efficients between acceptability ratings and raw scores in the men’s paired- 
comparison questionnaire were .64, and for the women’s questionnaire .44* 
Multiple correlation coefficients between acceptability ratings and raw 
scores for the ranking questionnaire were .76 for men and .78 for women. 

It appears to the reviewers that Ringness may have gone somewhat further 
in his interpretation of his data than the slue of his N’s justifies. 



The Relation of Voice and Speech Characteristics 
To Instruct o r Effectiveness 



.Shannon (303), in 1928, reported that the teacher’s voice was placed 
eleventh in order of importance among qualities listed by 3317 high school 
pupils and ninth in importance by 107 university students. One hundred 
twenty-four cry. ' ; teachers placed voics second among personal and social 
traits considered essential to effectiveness that wer.-> found to be weak 
in student teachers under their direction. Voice did not appear among 
the 15 most important qualities mentioned by 97 supervisors. 

In 1951 Richey and Fox (274) had 1883 high school boys and 2022 high 
school girls in Indiana check characteristics that pertained to their best- 
liked and least-liked teachers. Among characteristics of the best-liked 
teachers, the item, "had a pleasant speaking voice," was marked by 76$ of 
the boys and by 84$ of the girls. Of the characteristics of the least- 
liked teachers, "had bad speaking voice" was designated by 39$ of the 
boys and by 37$ of the girls. 

In other investigations discussed under the section on Opinion S ,udies 
voice was mentioned among the ten most important teaching characteristics 
in eight studies of high school pupils, nine studies of college students, 
and two studies of administrative groups. Voice was not included among 
the first ten traits in opinion studies of two grade school groups, four 
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studies of high school groups, seven college student studies, one study of 
administrative opinion, and two opinion studies of teachers themselves. 

In 1929 Barr (l6) studied the characteristic differences in the teach- 
ing performance of 47 good and 47 poor teachers of the social studies. 
Twelve of the good and 17 of the poor teachers were listed as having good 
voices. Twenty-five good teachers and 7 poor teachers showed "conversa- 
tional manner." A repetition of the study with another group of teachers 
produced similar results. 

In 1941 Baxter (25) in an investigation of teacher-pupil relationships 
reported results when 42 teachers were studied by two observers. Voice and 
manner of effective teachers were said to be original and intriguing while 
noneffective teachers showed voice and manner that were prosaic and color- 
less. 



In 1943 Henrikson (150) made some comparisons of ratings of voice and 
teaching ability. Teachers were selected at random from the files of a 
placement bureau. Results are shown in Table 32. 



Table 32 

Relation of Ratings of Voice and Teaching Ability 



Variables 


No. of cases 


Correlation 

coefficient 


Voice rated by supervisor of practice teach- 
ing vs. voice rated by school supervisor 


433 


.20 


Teaching ability rated by school supervisor 
vs. voice rated by practice teaching 
supervisor 


433 


.20 


Teaching ability rated by practice teaching 
grade vs. voice rated by supervisor 


432 


.27 


Teaching ability vs. voice rated by same 
judge : 






Training school supervisor 


434 


.62 


Public school supervisor 


580 


.58 



O 
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The last two correlation coefficients in Table 32 -appear to be a 
rather neat demonstration of tho inability of judges to separate supposedly 
different characteristics of individuals, viz . . teaching abili.ty and voice. 
Other good examples of the same inability on the part of raters can be eoen 
in the investigations of Martin (218) and Henrikson ( 151) • In 1944 when 
Martin correlated superintendents ' ratings of 123 teachers after their 
first year of teaching with the same superintendents' evaluation of voice 
and mechanics of speech, she obtained a correlation coefficient of .58. 

In a later study in 1949 Henrikson (151) investigated relations between 
personality, speech characteristics (voice, pitch, rate, quality), and 
teaching effectiveness of college teachers as rated on a five-point scale 
by 150 college students enrolled in a speech course. He reported coeffi- 
cients of cpntingency ranging from .42 to .66 and chi-square values showing 
significant relationships between various qualities of instructors as 
determined by the student ratings. 

From the studies reviewed above it appeals, in general, that the 
quality of the teacher's voice is not considered too important by school 
administrators, teachers, and students. Halo effect or "logical error," 
which so often has been found a contaminating factor in ratings, also 
appeared to be present to a large extent in these studies. 

A study, made by McCoard (210) in 1944 on speech factors a3 related 
to teaching efficiency, appears somewhat more promising. Speech effec- 
tiveness of 40 teachers in one-room schools was measured by having 22 
speech teachers rate each teacher on a seven-point scale on each of 14 
speech factors. Reccn ings were made while, each teacher read standardized 
material for three minutes and also spoke for three minutes on an assigned 
topic. A special pronunciation test was also administered. Correlations 
were obtained between the gains of 338 seventh and eighth grade pupils in 
a citizenship test and their teachers' speech scores. In the reading ex- 
periment 12 of the 14 ratings on speech factors and the total speech 
score were significantly correlated with student gains at the .01 level, 
and the correlations of the other two speech factors with student gains 
wqre significant at the .05 level of confidence. The coefficients ranged 
from .34 to .46. In the speaking experiment two speech factors, vari- 
ation in pitch and variation in quality, had correlations with student 
gains that were significant at the .01 level. Eight speech factors and 
the total were significantly related to gains at the .05 level of con- 
fidence. 

Correlations obtained between a composite of effectiveness ratings 
by supervisors and reading scores were all significant at the .05 level 
and all but two were significant at the .01 level of confidence. The 
correlation between total speech scores (reading and speaking combined) 
and supervisors’ ratings was .49* * Intercorrelations among various 
speech factors (pitch, quality, volume, rate, phrasing, distinctness, 
etc.) centered around .90 which led the author to conclude that even 
with trained judges an indication of general speech ability based on a 
single factor will give as good results as a total^ of judgments on 
several factors. McCoard reported correlation coefficients between 
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pronunciation test scores and other teacher measures as follows: pupil gain 

.02, supervisors' ratings . 40 , total reading score .49, total speaking score 

.40. 



In 1950 Huckleberry (159) investigated the possible relationship of 
speech to student teaching. He also attempted to develop means of iden- 
tifying significant speech qualities of student teachers and observed the 
effect of improvements of speech on 3tudent teaching competency. Three 
speech teachers rated recordings of 54 volunteer subjects (24 in the ex- 
perimental group and 30 in the control group) in terms of articulation, 
pronunciation, voice quality, voice pitch, inflection, rate, rhythm, and 
conviction. Huckleberry concluded that positive change in student teach- 
ing proficiency, as observed by critic teachers, was directly associated 
with positive change in rated speech proficiency. The reviewers compared 
the correlation coefficients of his experimental and control groups, how- 
ever, and found the differences were not statistically significant. 

While research on voice and speech characteristics tends to be some- 
what scanty, this area appears promising for research in the Air Force 
technical school situation. It is possible that voice, apart from other 
variables, plays an important part in supervisors ratings. It may be, 
too, that speech characteristics constitute a crucial instructor variable, 
that in addition to certain subject-matter knowledge or other prerequisites, 
the competent instructor is the one whose voice appeals to his class. It 
may be, on the other hand, that "actions speak louder than words," that 
the instructor who "knows his stuff" and is able to demonstrate his knowl- 
edge has little need for words. A potentially fruitful research approach 
to this problem might be first, to determine the extent to which student 
gains in Air Force schools are related to the instructors' oral presenta- 
tion; and second, to determine whether or not this ability can be measured 
prior to selection for the instructor assignment. 



The Photograph as a Predictor of Instructor Effectiveness 

Many school administrators require a photograph of the applicant to 
accompany letters of application for teaching positions. In order to de- 
termine the validity of this alleged aid to selection, Tiegs (333), in 
1928, evaluated photographs as a means for teacher selection. He re- 
ported that rankings by five judges of teaching effectiveness of 25 ele- 
mentary school teachers on the basis of photographs gave rise to inter- 
correlations among them ranging from .00 to .50. Official ratings of 
the 25 teachers given by superintendents, after the rating forms had been 
checked by principal and general supervisor, when compared with rankings 
by photograph produced a correlation coefficient of -.08. 

Johns and Worcester (169), in 1930, also attempted to submit the photo- 
graph to an experimental check. In their study 6 faculty members of a 
teachers college ranked 6 men school superintendents or principals, 6 high 
school, 6 elementary school, and 6 kindergarten and primary women teachers 
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on teaching effectiveness. Photographs of these 24 administrators or 
teachers were then mailed to D48 judges* 6l superj. itendents, 38 school 
board secretaries, and 49; placement bureau secretaries. The judges were 
asked to, rank members, of each group from the photographs as to their de- 
sirability as teachers. The results showed every photograph assigned 
every rank from one to six by every class' of judge. ’Correlations be- 
tween composite rankings of judges of photographs and faculty committee 
‘rankings were: for superintendents and principals, -.10; for high school 

teachers, .14; for elementary school teachers, -.01; and for kindergarten 
primary teachers, .37 ., No onq. judge of photographs in the whole 148 
agreed With the faculty committee ratings for ’any of the groups ranked. 



Statistical Analyses of Instructor Abilitie s 

Nine studies (12, 70, 142, 149, 189, 277, 295, 287, 288, 289, 314) report 
results of factor analyses of data from presumed measures of teaching abili- 
ties. In 1932 Butsch (70) by means of a tetrad difference analysis found a 
general factor among the intercorrelations of judgments of teacher traits. 

In 1943 Smalzried and Remmers (314) applied the Thi tone method of factor 
analysis to student ratings of 40 practice teachers on the Purdue Rating 
Scale for Instructors. Two factors emerged which they designated "empathy 5 ' 
and "professional maturity." Items which had greater saturation of "empathy" 
were fairness in grading, personal appearance, sympathetic attitude toward 
students, and liberal and progressive attitude. The items with the greater 
loading for "professional maturity" were self-reliance, confidence, and pre- 
sentation of subject matter. The other items of the scale show lower and 
more nearly equal saturation with both basic factors. 

Hellfritzsch (149), in 1945, reported a factor analysis of some 27 
teacher variables using data from the Rostker (282) and Rolfe (280) .studies. 
He concluded that four independent primary teacher abilities satisfactorily 
explain the intercorrelations observed between a battery of measures com- 
monly used in investigations of the nature, measurement, and prediction of 
teaching ability. These he identified as: general knowledge and mental 

ability; teacher rating scale factor; personal, emotional, and social ad- 
justment; eulogizing attitude toward the teaching profession. The four 
factors were uncorrelated with each other. Each of the several teacher 
measures was dependent primarily upon only one of the factors. Hellfritzsch 
also stated his study revealed that no single teacher measure of those he 
used could validly be substituted for the actual measurement of pupil 
growth in evaluating the ability of teachers to teach. Supervisory ratings, 
he. found, were only slightly related to observed pupil growth in social 
studies and, hence, Hellfritzsch concluded were of doubtful value as a 
measure of teaching effectiveness conceived in terms of ability to pro- 
mote pupil growth. 

In 19^0 Schmid (295) conducted an investigation to determine by means 
of. factor analysis if a few conmon factors mi^ht adequately summarize 
areas of personality and ability of prospective teachers. Scores were ob- 
tained by mean's of the Washburne Social Adjustment Inventory, Mooney 
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Problem Check-List, Minnesota Kultiphasic Personality Inventory, and per- 
sonal data from student files with respect to 24 traits for 51 male and 51 
female student teachers. The size of the total group tested varied from 
80 to 101 for the different variables. Schmid hypothesized that the fac- 
tor patterns would differ for males as compared to females and ran separate 
analyses by sex. Unfortunately this reduced the number of individuals 
represented by each correlation coefficient to such low figures (40 to 5l) 
as to make the results of his analyses highly tentative. Factor analysis 
of female scores yielded four common factors, identified as "problems in 
response set," "professional maturity," "introversion," and "social ad- 
justment." In general, Schmid says, these factors failed to cut across 
areas measured by the personality measures he used perhaps indicating that 
these instruments are measuring different aspects of personality. Factor 
analysis of the male scores resulted in two common factors, "social and 
educational adjustment" and a "personality-psychological" factor. The 
factor pattern of the male students showed a marked discrepancy from that 
of the female students. 

In 1951 Lamke (189), in a factor analysis of personality characteristics 
as measured by Cattell's 16 Persohality Factor Test for 10 good and 8 poor 
high school teachers with one year's experience, found that responses of 
good and poor teachers did not fall into two well-defined and characteristic 
patterns. There was some indication that some good teachers differed from 
some of the poor teachers on the responses associated with Cattell's source 
traits F (surgency vs. desurgency or anxious agitated melancholy), H (ad- 
venturous cyclothemia vs. withdrawn schizothemia), and N (sophistication 
vs. simplicity). The reviewers are inclined to doubt the significance both 
statistical and practical of factor analytic studies based on 18 cases. 

Ryans as part of the "Teacher Characteristic Study" has made a factor 
analysis of trained observer ratings of elementary and secondary teach- 
ers on a classroom observation scale containing 20 items referring to 
teacher behaviors and 6 referring to pupil behavior. Results of this 
study have been published in a number of different references (287, 288, 
289). A detailed account of the factors found is given in the section 
on Objective Observation of Instructor Performance. 

In 1951 Hampton (142) published the results of a factor analysis of 
supervisory ratings of Elementary teachers. Two different scales, a 
paired-comparison scale and a graphic rating scale, were used. Hampton 
concluded that a general factor did not account for the intercorrelations 
of the ratings on either instrument. Furthermore, that a greater number 
of factors was needed, namely six as compared with three, to account for 
the intercorrelations of the same traits on the paired-comparison instru- 
ment than v.as needed to account for the intercorrelations of the ratings 
on the graphic scale. 

In 1952 Bach (12) used the factor analysis approach in a study of the 
relationship of critic teacher ratings as student teachers and supervisory 
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ratings of the same subjects after they had had actual teaching experience. 

Bach found four factors for each of the two ratings, but only two of these 
appeared to be similar. ^ 

In 1952 Ringness (277) factor analyzed data concerning reasons giver, 
by teachers for choice of teaching as a career. This material is dis- 
cussed in detail in the section on The Relation of Teaching Aptitude Atti- 
tude Toward Teaching and Interest to Instructor Effectiveness. 

Two investigators (220, 285) have reported results of item analyses of 
instructor traits, one in terms of student change and the other in terms 
of principals 1 assessments. 

In I940 Mathews (220) made an item analysis of measures of teaching 
ability in relation to student change. By means of a battery of tests he 
derived a composite index of the changes produced in seventh and eighth 
grade pupils by 57 rural school teachers of social studies. The teachers 
were given a battery of 11 psychological, subject matter, and adjustment 
tests. Of the 1675 items in all tests given the teachers only 68 items, 
or slightly over possessed statistical significance in terms of pupil 
change. Mathews concludes that the findings cast serious doubt on the 
validity of the tests studied as measures of teaching ability when pupil 
change is used as a criterion. 

Ryans (285), in 1951, applied analyses of internal consistency and ex- 
ternal validation procedures to test items measuring the professional in- 
formation of 192 elementary and 165 secondary teachers with one or more 
years 1 experience. He used three teacher measures: (a) scores on the 

General Principles and Methods of Teaching Test of the 1949 National 
Teacher Examination battery; (b) principals’ assessments by means of an 
observation blank of teacher behavior in terms of pupil behavior, teacher 
personal-social behavior in the classroom, and teacher behavior indica.- 
tive of intellectual and educational background; (c) principals 1 general 
evaluation of teachers 1 over-all effectiveness on a graphic rating scale. 

The two principals 1 ratings produced an intercorrelation coefficient of 
.83 for both elementary and secondary teacher groups which might be ex- 
pected because of the common factors involved. Upper and lower . ■* of 
teachers were segregated and 'analyses of the three measures and item 
discrimination indexes for the teachers 1 test were computed for these 
groups. The General Principles and Methods of Teaching Test, Ryans con- 
cluded, appeared to be made up of items that functioned satisfactorily 
from the standpoint of interna] consistency. However, when the test items 
were analyzed against either of the principals 1 ratings less than 20^ of 
the 45 items discriminated significantly at the .05 level or better between 
high and low elementary teacher?. Only 5 % of the items discriminated be- 
tween high and low secondary teachers. Ryans attributes these somewhat 
unsatisfactory results to 11 . . • the doubtful validity and reliability of 
the assessments upon which the external criteria were based, the low re- 
liability of individual items, and the fact that understanding of education- 
al concepts comprises only one segment cf over-axl teaching effectiveness. . .” 
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In the opinion of the reviewers all the studies of the foregoing 
types so far reported suffer from inadequacies of criteria, tests, or 
numbers of cases. It still seems possible that a more adequately de- 
signed study might yield results of considerable basic importance to the 
solution of problems of evaluating and selecting instructors. 



Opinion Studies of the Personality Characteristics 
Of Effective and Ineffective Instructors 

For over fifty years attempts have been made to identify the person- 
ality characteristics of successful and unsuccessful teachers by making 
lists of traits based on opinions. In most cases these lists have been 
made up of subjectively estimated characteristics of such a vague, gen- 
eral nature as to render any precise measurement of them impossible. One 
of the earliest studies of this kind was that made by Kratz (181) in I896. 
When 2^11 pupils were asked to indicate the characteristics of their best 
teachers, the factors most frequently mentioned were! helped in studies, 
personal appearance, good, kind, pleasant, happy, Jolly, patient, polite, 
neat. In 1929 Charters ar.d Waples (7U) collected some 2800 teacher traits 
as reported by 27 teachers, 1 k parents, 10 pupils, 3 teachejr agency execu- 
tives, and 2 professors of education. It might be thought that this ex- 
haustive and comprehensive list would be the list to end all lists. How- 
ever, more papers using this approach have appeared since 1929 than ever 
appeared before that date. 

In the search for traits, qualities, and characteristics of the suc- 
cessful teacher, almost no stone has been left unturned. Table 33, liqts 
all available studies categorized according to the group from whom opin- 
ions were solicited. The studies are arranged in chronological order 
under each category. 

Several of the opinion studies that are somewhat interesting because 
of the novelty of the approach employed, the date of the study, or the 
magnitude of the effort involved will be briefly reviewed. 

In 19C0 Bell (27), in a study of the teacher’s influence, reported 
results of a questionnaire completed by men and 1*88 women normal 
school students. In indicating characteristics of those teachers that 
were most helpful the students’ answers fell into four groups: (l) 

moral influence; (2) personal interest, kindness, encouragement, sym- 
pathy; (3) intellectual influence; ( l * ) self reliance. Almost all stu- 
dents indicated that they had had a teacher whom they positively disliked 
or hated. The disliked teach rs were reported to have a malevolent atti- 
tude, either active or passive, resulting in such behavior as unjust 
punishment, sarcasm, insult, and ridicule. 

Shannon (303), in 1928, made a most comprehensive investigation of 
opinions of the personal and social traits of Successful and unsuccessful 
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secbndary school teachers* He interviewed 97 ’’selected'* supervisors; ho 
had 331? high school pupil9 and 107 university students list good and bad 
qualities of teachers; and he asked 124 critic teachers to list personal’ 
and social traits found to tte, weak in student teachers under their dih^c- 
tibn. ) Shannon also stu&ied the /problem by making analyses of traits use^l 
on rating scales, recommendation procedures, reasons for teacher failure, 
traits considered in Certification and cedes of professional ethics for 
teacher^. Among teacher traits Shannon found'to be considered most im- 
portant were such qualities as stimulative power, forcefulness, sympatbv 
affability, selJVcoqtrol, »and fairness. 



In ’1929 Jordan (173), in a study of personal and social traits as 
related to high school teaching, used a questionnaire of 46 traits. The 
i5 traits considered of most importance, the 16 of medium importance,, 
and the 15 of least importance were checked by 150 high school jAipils, 
120 teachers, ICO supervisors, and 120 school patrons. As an example 
of the outcome of typical studies of this kind, the 5 most important and 
the 5 lfcast important traits as listed by the various groups are given 
in table 2k* The rathei remarkable agreement among the four groups 
studied 'suggests the probrfble existence of, powerful cultural stereotypes 
li? tKe region where the study was ponducted. Tnis conclusion is em- 
phasised by the comparative lack of importance indicated by other studies 
df certain factors Judged.among the most, important in Jordan’s study. 



109 

O 

ERIC 



Table 34 



The Five Most and the Five Least Important of 46 Teacher Traits 
As Ranked by Four Groups of Judges 3 



Most ^important trait 



— — 

Pupils 


Teachers 


Supervisors 


Patrons 


1. Fair 


Intelligent 


Tactful 


Intelligent 


2. Intelligent 


Tactful 


Intelligent 


Fair 


3. Interesting 


Healthy 


Fair 


Broad-minded 


4 . Broad-minded 


Broad-minded 


Cooperative 


Tactful 


5. Cheerful 


Cooperative 


Healthy 


Patient 


Least important trait 


42. Dignified 


Trustful 


Ready of 
speech 


In touch 
with life 


43* In touch 


Willing to 


Of broad in- 


Trustful 


with life 


lead 


terests 




44* Thoughts cen- 


Reverent 


Thoughts 


Proud of 


tered out- 
side of self 




centered 
outside of 
self 


profession 


45* Reverent 


Modest 


Willing to 
lead 


Of broad in- 
terests 


46. Proud of pro- 


Thoughts cen- 


Modest 


Willing to 


fession 


tered out- 
side of 
self 




lead 



a Jordan (173). 



In 1929 Klopp (177) gave results obtained by asking summer school pu- 
pils in junior and senior high schools to compare 31 practice teachers 
with an "ideaj^ teacher” on 10 traits. A majority of the pupils rated their 
student teachdrs as equal to the ideal teacher on eight of these trait3 
(kindness, neatness, fairness, patience, approachableness, sense of humor, 
enthusiasm, willingness to help). Percentages for the different traits 
ranged from '565? 00 785?. The majority rated their teachers below the ideal 
teacher for thoroughness (555?) find discipline (62$). 

In 1932 Kyte (18?) asked 69 supervisors to analyze their most serious 
problem teacher. The. supervisors rated their unsuccessful teachers on 53 
characteristics. Among these deficiencies judged most important were 
deficiencies in leadership, \n influence on pupils* habits, in selection 
of method, in coi. . of class, and in work responsibility. 
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In 1936 Engelhart and Tucker (108) asked 224 high school pupils to 
check a list containing ICO positive traits and their corresponding oppo- 
sites for the teacher they considered best and also for the one consider- 
ed the poorest. Of the ICO traits, 46 were found to correlated signifi- 
cantly and positively with quality of teaching. The highest tetrachoric 
coefficient of correlation was .93, the 46th was .32. Of the 46 traits 
correlated 25 were .72 or above. The traits showing tetrachoric coeffi- 
cients of .80 or higher were good judgment »93» clear in explanation .88, 
respecting others* opinions ..86, sincere .83, impartial .83, fair .82, 
appreciative .80, interested in pupils .80, broad-minded .80. 

Tostlebe (334) made an analysis 01 the relative importance of various 
training factors to success in the one-room rural school. A check list 
of 135 items arranged as a four-point scale was. marked by 40 specialists 
in the field of teacher training and 40 county superintendents. Split- 
half reliability coefficient for the specialists was .85 and for the 
county superintendents .86. A correlation coefficient of .81 was ob- 
tained between the judgments of the 40 specialists and the 40 superin- 
tendents. A weighted r'.ndex was obtained for each of the 135 success 
factors which were then divided into fourths. The type of success 
factors which most predominated in the top fourth were those, centering 
about assignments, individual differences, study periods, mastery of 
fundamentals, unit method of instruction, adjusting programs, teachers 
personal self, and the relationships of the teacher to child and parents. 

Daniel (9l) compiled opinions of 202 superintendents, 267 principals, 
29 supervisors, 846 white teachers, 602 Negro teachers, 1659 white eighth 
grade pupils, 523 Negro eighth grade pupils, 998 white eleventh grade 
pupils, 378 Negro eleventh grade pupils, 1351 white patrons, and 973 
Negro patroris. Each of the above individuals indicated the qualifications 
of the teacher whem they considered best within their experience. All 
groups followed remarkably similar patterns giving first place to quali- 
ties related to professional interest and competency, followed by per- 
sonal qualities. 

In 1948 Witty (357) listed in order of frequency traits found in 
14,000 letters submitted by pupils from Grades 1 to 12 in a contest 
which required them to describe the teacher who had helped them most. 

In a second study of 33,OCO such letters the list remained substantially 
the same. The 12 most frequently mentioned traits in order were: coop- 

erative and democratic attitude, kindliness and consideration of the in- 
dividual, patience,, wide interests, personal appearance and pleasing 
manperl, ifairness and impartiality, sense of humor, good i disposition and 
consistent behavior, interest in pupils’ problems, flexibility, use of 
recognition and praise, unusual proficiency in teaching. 

Undesirable characteristics were also analyzed in the second study. 

In order of frequency the 12 most often mentioned negative factors were: 
bat) tempered and intolerant, unfair and inclined to have favorites, dis- 
inclined to enow interest in the pupil and to take time to help. him, 
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unreasonable in demands, tendency to be gloomy and unfriendly, sarcastic 
and inclined to use ridicule, unattractive appearance, impatient and in- 
flexible, tendency to talk excessively, inclined to talk down to pupils, 
overbearing and conceited, lacking in sense of humor. 

In a study made at Brooklyn College and reported by Ooodhartz (131) 
in 1948 and by Riley et al. (276) in 1950, 6681 students at Brooklyn Col- 
lege selected from a 10-item list, 3 qualities which they considered to 
be of the greatest importance in a teacher in the biological and physical 
sciences, the social sciences, and the arts. This study has a certain 
unique, value in that it secured opinions concerning teachers of different 
subject matter and did not assume that all good teachers would have the 
same qualities regardless of the subject they taught. 

In 1949 Irwin and Irwin (162) obtained an appraisal of certain teacher 
traits by 415 senior high school students by having them list words that 
might be used in describing good and bad teachers. 

Using 694 students from four college classes Bradley (50), in 1950, 
using an unstructured, open-end questionnaire technique, found that with 
respect to college teachers and their teaching, students like such fac- 
tors as '’teaching efficiency,” ’’meets students 1 needs,” ’’puts subject 
matter across,” "facilitates learning." These were mentioned 1649 times, 
or more than all other factors put together. Similarly, in terms of dis- 
like, the negatives of these factors appeared 1507 times, again more often 
than all the other negative characteristics combined. 

The results of all of this effqrt in conducting opinion studies of in- 
structor personality characteristics appear to be largely sterile in 
terns of usability for evaluative or selective purposes. It seems quite 
possible that anyone who had passed thiuigh the average American school 
system could sit at his desk and devise an "armchair” list of character- 
istics of the effective as opposed to those of the ineffective teacher 
that would be quite as useful as any list thus far developed. The trend 
in present day research in the area of selection and evaluation of per- 
sonnel is definitely directed away from opinion studies as sources of 
ideas concerning the requirements of teaching and toward the use of psy- 
chological theory and rationale in the development of systematic sets of 
hypotheses to be tested with objective tests and observational techniques. 
• 

Carefully designed opinion studies of personality characteristics 
of instructors might lead to some understanding of why supervisors’ rat- 
ings of instructor effectiveness, which are based on opinion, fail to 
correlate with the student gains criterion. Investigation might alt'o be 
directed toward the problem of providing sounder bases for supervisor 
judgment. It is possible that in such studies the use of some of the 
i»re recent methodological refinements such as Stephenson’s Q-technique 
or Cattell’s R-technlque might be productive of more operationally useful 
results. 
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It should be pointed out perhaps that mere collection of great masses 
of data does not necessarily produce a more effective study. Adequate 
sampling might have eliminated, for example, in the studies of Witty 
(356, 357), the arduous task of going through 33>OCO or even 14, COO let- 
ters withput sacrifice of any meaningful finding. 



Causes of ^Teacher Failure 

In a number of studies attempts have been made to set forth the causes 
of teacher failure. Several of these (60, 68, 187, 213, 216, 234, 237, 

278, 3ll) merely report summaries of superintendent's* reasons for dismis- 
sal of unsatisfactory teachers or give superintendents* opinions a3 to 
wha^. .constitute the chief weaknesses of failing teachers. Other investi- 
gators, Andersen (5) and Morrison (231), include, reasons for failure as 
reported by school board members, tfott (236) queried 200 teachers of 
agriculture, while James (164) canvassed opinions of college freshmen, 
schobl administrators, and teachers themselves. School principals were 
included in Littler*s (205) survey. McLaughlin (212) made a case study 
of 98 effective and 16 ineffective female elementary school teachers. 

The first such report available to the reviewers, that of Littler ( 205 ) » 
in 1914, mentioned weaknesses in maintaining disciplines in teaching skill, 
interest, personality, effort, and cooperation as the most important causes 
of (teacher failure. Subsequent studies have more or less reiterated in 
somewhat varied- terns the findings of this earlier report. Foor mainte- 
nance of discipline and ^ack -of cooperation tend to be listed among the 
chief causes of failure or dismissal in most of these studies. Health, 
educational background, training, ‘age, and knowledge of subject matter, on 
the other hand, appear to be relatively unimportant factors. These in- 
vestigations are marked by a complete absence of operational definitions 
c' the terms used, so that any estimate as to the importance of the vari- 
ous factors depends entirely upon the personal likes and dislikes, pre- 
cbnce'ptions and misconceptions of the judges and upon their individual 
interpretation of the terms. In none of these studies was any attempt 
made to observe unsuccessful teachers systematically in order to deter- 
mine those specific behavi'cs which differentiate the ineffective from 
the successful teacher. Another important consideration in evaluating 
these a udies of the causes of teacher failure is that the. stated causes 
iray have been concocted afte.* the decision to relieve the teacher of 
further duties had been made. 

as in the case of opinions regarding the unsuccessful teacher, many 
judgments have also been made as to^hat constitutes good teaching prac- 
tice. No one knows, however, to f what eodent manifestly undesirable be- 
hav?or may be offset by presumably desirable factors. In other words, no 
bne has determined what constitute the allowable instructor idiosyncracies. 
A potentially fruitful approach to the problems of determining instructor 
effectiveness might well be the investigation, through objective observa- 
tion techniques, of behavior characteristics coomonly deemed to constitute 



unsound teaching practices by educational authorities* Then study should 
be made of the extent to which such pedagogically undesirable behaviors 
may be present without appreciably reducing the efficiency of an instruc- 
tor in term?; of pupil gain. 



Personality Tests of Teachers 

Investigations of the relations of personality test scores to meas- 
ures of teacher success have yielded widely varying results. In Table 
35 are summarized results of studies ir. which attempts have been made 
to related various personality measures to measures of instructor effec- 
tiveness. The material has beer, grouped according to the personality 
measure used. It will be noted that correlation coefficients computed 
between scores obtained on the several sections of the Bernreuter Per- 
sonality Inventory and various criteria of instructor effectiveness 
range, for ’’neurotic tendency” frcm -.31 to .1?, for "self-sufficiency” 
from -.24 to .20, for "dominance-submission” from .CO to .33 > for "ex- 
troversion-introversion" from -.14 to .01. Correlation coefficients 
for the Bernreuter-Flanagan self-confidence scale range from -.38 to .CO 
and for the Bernreuter-Flanagan sociability scale from -.26 to -.06. 

High scores on the Bell Adjustment Inventory and on the Thurstone 
Personality Schedule are associated with poor adjustment so that negative 
coefficients with effectiveness might be expected. As reported by vari- 
ous investigators these range from -.04 to -.40. The positive coeffi- 
cients given in the Gould (133) study probably indicate only that he 
reversed the direction of his scores so that the results among several 
sets of variables would have comparable directions. Although the 
tetrachoric correlation of .52 found by Cooper and lewis (83) between 
pupil rating and absence of neurotic sign on the Rorschach is higher 
than is usually found with supposedly more "depe .viable" data, the authors 
point out that extent of overlapping prohibits the use of neurotic signs 
for individual prediction. An important feature of the Cook and Leeds 
(80) and Leeds (198) studies was the use of item analysis against the 
external criterion of teachers designated by their principals as -he 
best and worst in the schools in getting along with children. 

Ryans (266), in 1951, as part of the "Teacher Characteristic Study" 
referred to earlier, studied the relationship of scores on the Thurstone 
Temperament Schedule for the upper and lover 2?jS of a group of 275 ele- 
mentary teachers selected on the basis of composite observer ratings. 
These ratings hAd been factor analyzed by the centroid method and yielded 
five oblique factors which appeared to refer to 8 (a) pupil participation 

and teacher open-mindedness ; (b) controlled pupil activity and business- 
like approach; (c) teacher calm and consistent, liked because "human;" 

(d) sociability; (e) appearance fnd attractiveness. (This last factor 
was not uied in the analysis.) Differences for the "vigorous" category 
of the Thurstone Temperament Schedule were significant at the .01 level 
for Factor (a); for the "impulsive" category at the .05 level for Factor 
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(d); for the "dominant" category at the i01 level for Factor, (a), (d) 
for total rating, and fo. rating of pupil behavior (taken separately) 
for the '’sociable” factor at the .05 level for Factors (a), (d), and 
rating of pupil behavior. 

Other personality test3 given to teachers have included the Pressey 
X-0 Test (27l), the Rudisill scale for measurement of the personality of 
elementary teachers (132), the Occupational Personality Inventory (301, 
102), tests of Cattell’s primary source traits (297), Cattell’s 16 Per- 
sonality Factor Test (189), Johnson Temperament Analysis, Minnesota Per- 
sonality Scale, and Minnesota T-S-E Test (337). Correlation coefficients 
where reported tend to be low and are probably not significant except pei 
haps for some of those found by Schwartz (297). Using 3A teachers, he 
reports coefficients ranging from -.32 to .28 when l**at9 of "primary 
source traits" were correlated with practice teaching rating, and coeffi- 
cients from -.60 to .31 (N s 18) when the "primary source traits" were 
correlated with supervisors* ratings. 
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.Lamke (189), in 1951, attempted to find out if the personalities of 
good and poor teachers as evaluated by Cattell , s 16 Personality Factor 
Test were characteristically different. He used Fisher’s discriminant 
function and factor' analysis in the examination of his data. Results 
of the analysis by either method failed to reveal a characteristic per- 
sonality pattern for either the good or the poor teachers. Lamke says 
the response patterns of the teachers studied on the, i j personality fac- 
tor test suggest that "It* is possible that personality traits need to be 
•balanced in a certain way for the teacher to be superior. Lacking 
this balance*, perhaps the teacher is likely to be only average; with 
a certain makeup she may be poor.’ 1 Considering the results of’ the fac- 
tor analysis of the responses to this test, lamke concluded: 

"Using Cattell’s terminology,, it appears that good teachers are 
likely,- more than poor teachers, to be gregarious, adventurous, 
frivolous, to have abundant emotional responses, strong artistic or 
sentimental interests, to be interested in the opposite sex, to be 
poMshed, fastidious and cool. Poor teachers are more likely than 
good teachers -to be Shy, cautious, conscientious, to' lack. emotional 
response and artistic or: sentimental interests, to have a compara- 
tively slight interest in the opposite sex, to be clumsy, easily 
pleased, and more attentive to people." (Lamke, Reference 189*) 

Other measures) related to personality tests, which have been stud- 
ied by a number of investigators, are those pertaining to various aspects 
of social adjustment. The results of the studies dealing with these 
variables ‘‘are shown in Table 36. It will be seen that most of the corre- 
lation coefficients found between social adjustment measures and other 
measures of instructor effectiveness tend to cluster around zero. Some 
exceptions are evident in the -case of the Washburne Social Adjustment 
Inventory andi^ackson^&i^oci&l Frofic,iency Test. Correlations r an 6 e d 
from .»aO' report- ed by Gotham to -.60 f^und by, 'Schwartz when scores on the 
Washburne inventory were correlat'ed with ratings. LaDuke obtained a 
correlation coefficient of -.3? >tfien he correlated scores on the Jackson 
test with pupil gains. The extreme variability of results found with the 
Washburne Social Adjustment Inventory and the generally insignificant 
relationships shown by other "social" tests suggest that such measures 
have little to contribute" as predictors of instructor effectiveness. 

Results obtained with personality tests of teachers have in general 
shown wide variation when Correlated with measures of teacher effective- 
ness. Correlations range from rather large positive or negative relation- 
ships to.'zsro or near zero relationships’ depending upon the particular 
situation and' the teacher measures used. There -are many conceivable kinds 
of effectiveness even for t'eachers 6f the same subject or grade level in 
the same kind of conmunity and therefore there will probably be different 
patterns of teacher personality ’for such effectiveness. -As Lamke and 
others hAve pointed out, success Jn teaching ma/be a ’^balance" and to 
predict success it may be necessary to understand what is required for 
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the balance.. Study of the association of straits, one by one, with- success 
will not suffice. The problem of determining the personality patterns ;of 
the effective teachers still remains unsolved, despite the fact that seme 
so-called personality (and other) measures apparently show significant 
correlations (either positive or negative) with certain measures of in- 
structor effectiveness. Carefully controlled, well-designed studies em- 
ploying adequate numbers of instructors are needed to determine what 
measures or combinations of measures have definite predictive value. 

There is probably even a greater nee'* for the development of adequate 
rationales, frameworks and systems of hypotheses which are based on 
the best available theories concerning social interaction, interpersonal 
relationships, motivation, and learning. Through research effort these 
theories may then be related to specified dimensions of teacher personal- 
ity and performance. 



IMPLICATIONS FOR FURTHER RESEARCH 

After scrutiny of several hundred research studies pertaining more or 
less directly to the identification of instructor effectiveness, the re- 
viewers have arrived at certain conclusions with respect to the areas in 
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which further research is needed and in which the probabilities of securing 
worth-while results appear greatest. In certain other areas, however, the 
available studies seem to demonstrate beyond reasonable doubt tha* research 
his already proceeded for a considerable distance up a blind alley. The 
problems wh’ich in the opinion of the reviewers appear worthy of further re- 
search fall into both the main categories into which the review is organized 
those problems having to do' with the search for more adequate criteria of 
instructor effectiveness and those problems concerning discovery or im- 
provement of predictors of the criteria. 



Criterion Research 



The changes induced in the students by the instructor appear to con- 
stitute the most important component of any criteria of instructor effec- 
tiveness. As Orleans et al. (249), Evans (382), and other!* have pointed 
out, the ideal criterion of the effective instructor is probably a composite 
of several measures. For Air Force -instructors it seems obvious t hat the 
relative gains in subject-matter knowledge of groups of students under 
different instructors should be a -most important element in this composite. 
The Air Force technical schools because of the large numbers of personnel 
iristrupting in the same subject-matter fields offer an ideal situation in 
which 'to make a thorough investigation of this, criterion. 

The Jesuits obtained from any simple use t of /raw gains. scores are cer- 
tain to be. misleading i Th$ adequate use' of the'gaj.ris criterion requires 
the control of -such variables as student,’ aptituaq, ability and motivation, 
the effects of distractions, diverse classroom conditions, cultural dif- 
ference^. in different localities, and the like. 

The, reliability of a measure of instructor effectiveness should be the 
reliability of that effect on different or siicce$sive classes and not the 
split-half reliability determined from the same olass in which situational 
and temporal variance' (more properly reviewed as error variance) increases 
the estimated reliability. This involves rather elaborate design and sta- 
tistical manipulation much beyond' the scope of the average school system 
or the average .supervisors capabilities. As a practical measurement de- • 
vice, apart from its use 4n an experimental situation, the measurement of 
student gains affoixis a costly, unwieldy* and laborious method of evalu- 
ating instructors. If It can be shown phat stude/it gains correlated ade- 
qvately wiph some other more; easily obtained measures, these latter could 
be used, for most research ,ind bdministnatiye purposes as substitutes. 

The ddmand continues for more bbjective measures to used for in- 
structor selection and evalqation. Precise methods of direct observation 
have been little used in ..determining instructor effectiveness, probably 
because of .the inherent difficulties in their, application-. Cuch observa- 
tions require study as potential predictors 'of other criteria of instruc- 
tor performance j measures' of observable behavior which < turn out to be valid 
could then, in turn, be further used as criteria for future research or for 
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practical application as evaluation indexes. Exploratory studies designed 
to investigate various techniques of instructor observation are thus ur- 
gently needed. The utilization of tape recorders, photographic, and other 
recording devices in connection with observation of instructors has not 
been thoroughly investigated. While some work has been directed toward 
observing instructors in a classroom situation there appear to be few, 
if any, studies of methods for making reliable observations of instruc- 
tor and student behavior in the laboratory or shop. In this connection 
the methods of Olson and Wilkinson (24#), by means of which they attempted 
to determine differences among teachers in terms of the amount and kind 
of verbal direction used in controlling behavior of elementary school 
pupils, appear worthy of further investigation. Their techniques, if 
modified to suit adult students, might well produce results of value in 
the evaluation of instructors and instructional methods in Air Force 
technical training schools. Observation to be of research value, how- 
ever, must be repeatable by other scientists. Judgments of Instructors 
that depend for their accuracy on the intuition or diagnostic skill of a 
lone observer are not adequate data for research. This may mean that 
every possibility of success is eliminated, but it still remains to be 
demonstrated that behaviors which can be reliably observed by different 
observers and which are reliably associated with different occasions (are 
typical of the instructor) are not related to effectiveness. 

The relatively high coefficients obtained by Shannon (307), when he 
correlated student attention scores with scores on achievement tests, aleo 
suggest a lead which might prove useful if applied to students in an Air 
Force situation, despite Shannon's rather low opinion of his findings. (See 
the section on Objective Observation of Instructor Performance.) 

There are, however, other aspects of the instructor's performance that 
may play some part in his over-all effectiveness as a member of a group 
with a common goal. For instance, the instructor has certain administrative 
and clerical responsibilities that, while they do not add to student gains, 
are important to the orderly administration of the training courses. Fur- 
ther, it is possible for instructors to contribute to a greater or lesser 
degree to improvement of the curriculum and to the development and promo- 
tion of better methods of presentation. Estimates of the extent to which 
different instructors make such contributions are probably best obtained 
from supervisors' ratings of instructors. 

Additionally, it seems possible that the behaviors and expressed atti- 
tudes of the instructor could have a marked effect on the willingness of 
both his students and fellow instructors to work together to accomplish a 
group mission. In other words, the influence of the instructor on school 
morale may also be an aspect of his effectiveness. This aspect would prob- 
ably best be reflected in ratings of . the instructor made by his fellow in- 
structors and by his students. Nothing is known of the amount of inter- 
relationship or the extent of independent reliable variance likely to * 3 
found in such measures in Air Force schools. Considerable research effort 
would be necessary to determine the weightings that should be used in any 
composite criterion of instructor effectiveness. 
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The general unsatisfactory nature of past rating methods has stimulated 
the search for more satisfactory techniques. Among rating methods the 
forced-choice technique evidently offers some premise for operational use 
since it tends to reduce biasability. Considerable research would be re- 
quired, however, to determine 1 £he value of forced-choice scales devised 
f6r use by student raters, fellow, teachers, or as self-rating scales. It 
must be determined also whether or not repeated ratings on forced-chdice 
forms, like those on graphic scales, tend to become progressively more 
lenient and less valid. 



Little practical use has been made of fellow teacher ratings in civil- 
ian institutions. While an instructor's opinions of his fellow .instruc- 
tors may be biassed, it is more than probable that through his day-to-day*, 
closei contacts with them he knows what kind of instructors they are* His 
relationships with his fellow instructors being different from those of 
the supervisors or students will enable him to know them in a somewhat*’ 
different way and his judgments of them vdii be based on this different 
point of view. Peer ratings of instructors in the Air Force should receive 
further investigation, either through forced-choice or other methods, in 
the expectation that they might be used to corroborate supervisor ratings 
or as a part of a composite to b ng about a more adequate rating of in- 
structors than supervisor ratin' ■* used alone. 



Student ratings are being .’aei more and more widely in civilian schools 
and colleges, a f-^end in .keeping wi* b the present day tendencies- .t o give 
greater emphasi_ to the democratic _,„oeess in education. The argument is.i 
frequently advanced that in the Air Force technical schools the phases are 
so short that the student has fnsufficient time to get well enough ac- 
quainted with his instruotc*r to make adequate judgment ,of him. The -total , 
hours an Air Force techn5.cal school. Student spqnds with his instructor, 
however, are often considerably greater than the time a college student 
spends with his instructor during a one semester college course. As in 
the ca&e of peer ratings, student ratings have played no great part in 
the evaluation 'of Air Force technical training school instructors. Thor- 
ough study would be required to determine tneir utility for self-improve- 
ment of instructors and also to discover their* value as a criterion per se 
or as a predictor of gains or other* criteria of instructor effectiveness. 



It is possible that the use of a composite criterion will obsfcure pat- 
terns and significant elements or specific aspects of effectiveness. It 
may be difficult to add together, say by means of regression equation tech- 
niques*, different ‘components of teacher effectiveness so that a hi'gh degree 
of o^e’ component is allowed to counterbalance a low .degree of another, when 
.botji may be equally important in their own way. Thus, it may be ne'cessary 
to develop hew ways of combining oh otherwise utilizing several critoria. 
The development of such a composite will require the best available. judg- 
ment ,on the part’ of psychologists and schodl administrators as to the rela- 
tive weights to be assigned considering the interrelations found. 
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Since many of the studies reviewed have been concerned with ratings, in 
the foregoing discussion of implications for further research on criteria, 
the reviewers have , emphasized methodological considerations. The majc 1 ' prob- 
lems of research o;n criteria, however, may not be methodological but rather 
conceptual or definitional problems. The objectives of training programs 
need to be defined, students* achievements, of those objectives insofar as 
they can be' measured need to be ascertained, and the effects of instructors 
on these achievements need to be isolated. 

It. is contended by some educational authorities that no kind of rating 
on any kind of scale by any kind of person is' likely to provide an accept- 
able criterion until it can be shown to be related to student change in 
the direction of the educational objectives of the sphool or training prd- 
gram. This is an extreme position which would appear to rule out, as un- 
acceptable, ratings which tend to show negligible correlations with stu^- 
dent gains. While it appears ‘reasonable that measurable student changed 
should constitute a part (perhaps the largest part) of a total criterion o£ 
teacher effectiveness, it is also possible that ratings may reflect, area's 
of effectiveness not directly measurable. The question of whether ratings 
are acceptable as a part of a total criterion depends on whether there are 
logical grounds 'for believing that the teacher can contribute to the ac- 
complishment ,of school objectives in addition to his effects on his own 
students. If it seems possible for teachers to contribute differentially 
to .the group efforts- through work on the curriculum, through development, o'f 
improved methods, through their 'influence on group morale, etc., then these 
contributions should be-a part of any total criterion of effectiveness. 

If it likewise 'sebms possible that ratings might reflect the quality of a 
teacher* s participation in the group .effort, then the use of ratings as an 
element, in a total criterion of effectiveness is justified. 

To the reviewers the raajpr. problem connected with ratings is not the 
justification of their use, but uather the improvement of their accuracy. 



' Predictor Research 

Research on predictors necessitates formulation of hypotheses and the 
development of conceptual frameworks based on the best available psychologi- 
cal and educational theories. These hypotheses will reflect the rationale 
that certain traits or behavior of an instructor may be expected to be re- 
lated to and hence may be used as predictors of instructor competence, tor 
example, hypotheses might be set up with respect to the relation of instruc- 
tors* intelligence to instructor effectiveness for different kinds of sub- 
ject matter. Similar hypotheses might be generated for age, experience, 
extracurricular activities, sex, verbal facility, and other instructor vari- 
ables. 

. The differential relations of instructor intelligence to instructor 
effectiveness for different kinds of subject matter should be determined. 
Likewise the bptjjral relations between .instructor intelligence and the ap- 
titude and experience levels. of students should. bo investigated. The 



student gains criterion might be used to determine the value of intelli- 
gence ’as a predictor of instructors’ competence in courses of differing 
complexity. It is quite possible that the intelligence factor when used 
with other instructor measures might contribute materially to an instruc- 
tor selection battery. 

« 

A number of investigators have shown a relationship between instruc- 
tor effectiveness and age or experience which appears to be curvilinear. 
Teachers tend to reach maximum rated efficiency after five or more years 
of teaching experience. In the Air Force, however, extremely few (approxi- 
mately two per cent) airman instructors remain in a teaching assignment 
for as long as five years. If the Air Force is indeed losing the majority 
of airman instructors before they reach their period of maximum efficiency, 
a change in policy might be anticipated. 

The evidence suggests that the kind and number of activities a teach- 
er has engaged’ in may have some relation to his effectiveness as a teach- 
er. This finding, as shown with respect to certain specific extracurricu- 
lar activities in the case of some civilian school teachers, might also 
apply to Air Force technical school instructors. A study might be made 
to determine if past participation and interest in specific activities 
(or in many varied activities) are related to an instructor’s success in 
training student airmen to become proficient in varied technical school 
specialties. 

No fundamental differences in instructional effectiveness between men 
and women teachers have been demonstrated. Although these findings were 
obtained in quite different training situations from Air Force technical 
courses, the possibility of utilizing WAF instructors should not be over- 
looked. 

The rather interesting findings of McCoard (210), with respect to verbal 
facility suggest several potentially fruitful areas of research: (a) to 

determine the relationship between the verbal facility and technical in- 
formal-ion an instructor shows in the classroom a& compared with his ability 
to demonstrate equipment and procedures in the technical laboratory or shop; 
(b) to determine the extent to which an instructor’s aoility to organize 
and present verbal material is related to the subject-matter gains of his 
students; (c) to find out if verbal facility cat* be measured and used as 
part of the instructor selection procedure. 

The investigations of factor analysis of instructor abilities so far 
available are somewhat vitiated due to inadequacies of criteria, measuring 
devices, or numbers of cases used. A more adequately designed investiga- 
tion might yield factorial results which might prove of considerable value 
toward the solution of instructor selection and evaluation problems in the 
Air Force. 

The personality patterns of the successful instructors have not yet 
been determined. This does not mean, however, that this approach should be 
abandoned. Carefully controlled, well-de 3 igned experiments employing 



adequate numbers of instructors would be needed in which plausible measures 
or combinations ofysuch measures are investigated. Certain tests used in 
preliminary studied have shown promise. These should be used in more thor- 
oughgoing experiments. The search should continue also for new and untried 
measuring instruments in the hope that some device will be discovered which 
will enable the Air Force to predict teaching success of instructors in 
training and W evaluate instructors on the Job. 

It should be pointed out that the importance of many of. the problems 
suggested by this Research Bulletin has been recognized by the Air Force Per- 
sonnel and Training Research Center, and preliminary experiments in several 
of these areas are now underway. 
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